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FOREWORD 



The TechnicalTapers are explanatory manuals for the use of both instructors and 
students. They are expository presentations of avaiFable information on each 
subject designed to^encour^ge innovation in teaching methods and materials. These 
Technical Papers are developed, pnnted, and distributed by the Commission on 
College Geography under the auspices of jhe Association of American Geographers 
with National Science Foundation support. The i(leas presented in these papers do 
not necessarily imply endorsement by the AAG. Single copiers are iT\aile"ii free of 
charge to all AAG members. . ' ' ' 
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PREFACE 



In developing this Bibliography, I have become aware of the tremendous 
explosion of statistical applications in almost all br^hes of geographic research. 
An originat-core listing expanded to almost four times its onginal size, reflecfting an 
attempt on my part to include references from areas of research which I normally 
did not cover in teaching, as well as more recent work. Tho^ result has been a rather 
extensive coverage of many more topics than could be covered in the time available 
for a two-semester course. Perhaps the first twenty topics listed* together with , 
simple correlation and regression, cou^d be handled in such a time' period. The 
remaining topics would then form background for advanced undergraduate or 
graduate courses. ^ 

A caveat should be etitered on the utility of such a bibliography.. In my 
experience, any listing of reference material is of limited\alue unless it is closely 
tied to a lecture series, and to exercises related to^a particular topic. At various 
check-points in the development of the course it also seems valuable to ask for 
reading reports on a grpup of references, "bad" as^ well as "good,** to validate 
student progress. In the early rush to print quantitatively onented articles^ it is now 
cle^r that some major problems in the application of statistical techniques to spatiaj 
data were not recognized. I f<^el it is important for the student 'to realize this,/€m,d 
therefore many of these early articles are included here. If these criticisms are 
placed in the correft perspective of the development of the use ofsuch methods, as 
in Sections 1 3jn the Bibliography, then some degree of maturity ^nay have been 
attained in this branch of the; subject. . , ' ^ \ ^' 

The original set uf references was drawn up for use in a cou^e .entitled 
"Introduction to Geographic Theory and Quantitative Methods,*' to which students 
at McGill University, the University of California at Berkeley, and the Swiss Federal 
Institute of Technology at Zurich have beenr subjected over the years* For 
encouraging reactions to that earlier course outline, I am grateful to T. Lloyd^ L. J. 
King, 1. A. Brown, R. A. Mur^die, M. F. Dacey, P. R. CSould, and G. Rushton. For 
help with the revision, I would like to thank several colleagues at York University. 
I. F. Owens, J. Spence, C. D. ^orley, G* B. Norcliffe^-awJ D.'R. Ingram. Finally, 
I owe a great debt to Gillian Gilnjour,^ not only for constructive qnticism of this 
final version, but also for valuable aid in ^the development of that earlier set of 
materials^ and to Les King, fur his expert assistance in editing'and revising an earlier 
^draft.^ . ^ 
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INTRODUCTION; THE NATORE OF THE BIBLIOGRAPHY 



■> In a'recent review article, Krumbein (1968) recognizes three stages of statistical^ 
development: j) descriptive statistics, in which the sannple is the focus of interest, 
2) analytical statistics, in which the population is the primary interest; and 3) the 
application of .stochastic process models. In these terms, only the first two stages 
are treated in this Bibliography. Additionally, it i>^ery important to r^lize that 
statistical techniques are only tools to the further understanding of substantive 
problenrs. The first three introductory sections of the Bibliography underline this 
statement. 

* Although we are primarily concerned with the application of various statistical 
'techniques in geography, these efforts must be based securely on an understanding 
of probability theory. Notions of a sample space, expected values, random 
variable^ population and sampling distributiotis, etc, are today usually learned by 
students Ijefore they register for any cou,rse in statistical methods within the 
geography depar|nient. They are iherefo're not treated in this Bibliography, 
although several general texts are listed in Section 4. (The reader j»^ref erred to the 
account of probability, and especially population distributions, given by Krumbein 
and Graybill, pp. 89- 1 15, referenced in that section.) Probability concepts are also 
basic to an understanding of sampling theory (Section 1 1). Since the applicability 
of many statistical technique* depends intimately on the nature of the sample and 
on assumptions Ibout the uaiverse from which it was drawn. Section 1 1 assumes ^ ' 
major place in the Bibliography. The problems ^nd benefits of sarnpling theory are 
generally neglected in geographic research, yet the applications of all the more 
powerful parametrtc tests depend on a sound sample design. 

While a glance at t^e list of section headings will enable the reader to see what 
has been defined as "statistical applications" in this Bibliography, a statement 
concerning what has not been included may also be valuable, at lea^t in indicating^ 
other major areas of vital research where quantitative techniques are important. As 
mentioned above, stochastic niodels have generally been excluded. The interested 
reader is referred to a long list of publications by Dace^^j^^ point processes as one 
avenue of approach in tjiis field. (Contributions made by Dacey in the study of 
point patterns are reviewed by King, 1969, pp* 32-59 and 226-230, referenced in 
Section 4.) Simulation models .such as' those used in studying the diffusion of 
innovations (Hagerstrand, 1967, Brown, 1968), probability models using a Bayesian 
approach (Curry, 1966), and Markov Chain models (Brown, 1970) are similarly 
excluded. Optimization techniques are not covered, and the interested reader is 
referred to an introductory article on xthe use of linear programming (Cox, 1965), 
the review of dynamic programming (MacKinnon, 1970), and a recent book which 
emphasizes combinatorial programming (Scott, 1971). The Theory of Gam^ is 
excluded (for example, Isard er 'j/. 1970), as well as important advances in our ^ 

^ 'I ^ 
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kfiowledge of the relationships between topology and geography in the application 
of graph theory. (For an application to transportation networks, see Werner, 19^; 
and for fusion of this approach with probabilistic models, see a study of river 
networks, Shreve^ 1967.) Finally, concerning behavioral approaches to geographic 
research, there is only limited reference in the Bibliography tb^ome of the scaling 
problems involved in space preference measures. For some such work the reader is 
referred tp the excellent review by Craik (1968), and for a discussion of one 
technique based on personal construct theory to Golant aiid Burton (1970). 
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ORGANIZATION OF THE BIBLIOGRAPHY 

1 • ' 



For each of the Sections, a short introductory statement is made concerning the 
general nature of problems investigated using that technique, where applicable.. 
Some reference is also made to those articles that are relevant in my experience. 
The introduction is not aimed at being a statistipal primer but is rather concerned 
with the concepts involved. $ " v* * 

The following list of references will, in most cases, notldeal entirely with the 
section topic; this applies in particular to the more tecent articles in which 
statistical tests are often incorporated into the research design. In some cases where 
books or monographs are referenced, the relevant page nyinbers are indexed. 

The heading SEE ALSO refers to articles that have referenced in sections 
prior to the one in question. The notation used is as, follows: Section number. 
Author (Date) .J ^ • . . ^ ^ 

Jhe heading^ REFERENCE indicates that some of the books or articles listed in 
tlie Sections of Reference (4) or Review Articles (5) have material on the topic in 
hand. The^ notation used is. Author (pages). From Section 4 the following texts are 
used: Gregory (1968), king (1969), and Krumbein and Graybill (1965); while the 
following review articles are listed where relevant: Barry (1963), Chbrley (1966), 
Hart and Salisbury (1965), and Strahler (1954). " ; 



OTHER BIBLIOGRAPHIC SOURCES ^ 



ANDERSON, M. A Working Biblio&ravhy^of Mathemtical Geography. (Mich^an 
' Inter-University Community oT Mathematical^Geographers. Disciission ^er 

No. 2). Ann Arbor, Micl^.: Department of Geography^ Unjyersity of Michigan, 

1963.52 pp. . ^ J < <• ^ 

HOWARD, J. CTBibUqgraphy of Statistical Applications in Geologic. (C.E.G.S. 

, Programs, Publication Number 2). Washington, D.C.: American Geological 
'•Institute, 1968*:^24 pp. . ; 

PITTS, F. R. (ed.). Current Research Notes in Qitahtitative and Tlieoretical 

Geography. Honolulu, ^H^waii: Social Science Research Institute, University 

"of Hawaii.. Num^er^ 1, August 1971, continuing (contains bibliographic 

materials).^ , i < ^ 

PORTER, P. W. A Bibliography of Statistical Cartography. Minneapolis, Minn.: 

Department of Geography, University of Minnesota, 1963,66 pp. ' ^ 
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* ~ SECTION .1 • 

DEVELOPMENTS IN GEOGRAPHIC METHODOLOGY L 

* } 

The use of ngorous scientific methods of research with a statistical foundatio^ 
commenced in human geography in North America on a large scale scarcely mor^^ 
than fifteen years ago. Althougli tliere had been some quantitative work earlier, for^ 
example centrography and social physics (see Section 19), the value of a statistical 
approach was first clearly stated in the articles by McCarty (1956) and Garrison 
(1956): 

Some of the readings are **position papers" at various points in time; compare 
Ackerman (1958) and^ Kohn (1970). Others deal more explicitly with the evolution 
of the quantitative approac;h (LaValle et a/., 1967) and especially its relevance for 
the development of geographic theory (Burton, 196*3). The relationships between* 
traditional (quahtative) and modern geography are discussed by Spate (1960) and 
Mackin (1963). The vitality of this debate, especially among schools of g^graphy 
outside of the United States, is refiected in the recent book edited by French and 
Racine (1971). Lven so, the inatiequacies of older methods had been clearly 
demonstrated more than a decade before (McCarty and Salisbury, 196 1). , 

Apart from individual leaders generating change within the discipline^'geograph^ 
has shared with many other physical and social sciences an increased amount qf 
information available for research. This information explosion has forced re- 
searchers to inquire into the organization of spatial data (see Section 8), as well as 
to make use of modern technology in storing and manipulating data (Section 10). A 
beneficial, and entirely natural, outcome of these external irvfluences is seen in our 
abiHty tt) test hypotheses in a mariner almost inconceivable ten years ago. 

While this general change in geographic methodology a contains a set of 
techniques that must be mastered in ^rder to understand much, of the literature 
today, the need for clarity ind simplicity in the conceptual underpinnings of the 
(Ji^cipline has, been well.demon^trateqi. The article by Tli^omas (1964), originally 
prepared for a high school course, providers an e^^cellent 0xamp[e of this most 
important by-product of changes in methodology. **' ' *, 

ACKERMAN, A. Geography as a Fundamental 'Research DiscipHne. (R^arch 

Paper No. 53). Chicago. Department of Geography, University of C!mcago^ 
,,1958. ^ * . ' 

ANUCHIN, v. a. "Mathemati/ation and the Geographic Method/' Soviet G^og- 

rapiiy: Review an^ Translation, Vol. 1 1 , 1970, pp. 71-81. ^ " '^^ 

BURTON, I. "The QuantitaUve Revolution and Thegretical Geography,'^ Tfte 

Canadian Geographer: Vol. 7, 1963,>;p. 15 1- 162. ' ^ ^ 

.CHAPMAN, J. D. "The Status of Geography," Tlie Canadian Geographer, 

1966, pp. 133-144. , ^ ' ^ '\ : 

CHORLEY, R. J. "The Application bf Quantitative Methods to Geomorphology/' 



in Chorley/Fl. J. and P.<Haggett (eds.), Frontiers in Geographical Teaching. 
London. Methuen, 1965, |)p. 147-163. , ' • 

COPPOCK, J. T. and J. JOHNSON. "Measurement in Human Geography;' 

Economic G^aphy, Vol. 38, 1962, pp. 130- 137. 
FRENCH, H. M. and J-B. RACINE (eds.). Quantitative ^and Qualitative Geography 
La Necessite dun Dialogue. (Occasional Papers, Department of Geography, 
No. 1). Ottawa, Canada': University of Ottawa Press, 1971. 
GARRISON, W. L. "Apphcabihty of Statistical Inference to Geographic'Research," 

Geographical Review, Vol. 46,-1956, pp. 427-429. 
GARRISON, W. L. **Some Confusing Aspects of Common Measurement," The 

Professional geographer. Vol. 8, 1956, pp. 4-5. ^ 
GOULD, P. R. "Methodological Developments Sinc^ the Fifties," Progress in 

Geography, Vol. l/l969, pp. 3-49. 
GREGORY, S. 'The Quantitative Approach in Geography," in French, H. M. and 
J.B. Racine (eds.); Quantitative and Qualitative Geographf, La Necessite dun 
Dialogue. (Occasional Papers,^ Department of Geography, No. 1)» Ottawa, 
Canada: University of Ottawa Press, 1971, pp. 25-33. , 
GRIFFITHS, J. C. "Current Trends in Geomathematics," Earth Science R^w, 

Vol.6, 1970, pp. 121-^ 14Q. f - ^ 
KOHN, C. F. "The 1960's: A Decade of Progress in <}eographical Research ^nd 

Ans[T\icl\on:Unfials, AAG Vol.60, 1970, pp. 21 1 -219. 
LAVALLE, P., H. McCONNELL, and R. G. BROWN. "Certain Aspects of the^ 
? Expansion ot Quantitative Methodolo©^ in American Geograpliy," Ann(ids, 
AAG, Vol. 57, 1967, pp. 423-436. f\ ' ' , 

MAQKIN, J. H. "Rational ahd Empirical Methods of investigation in Geology," in 
Albritton, C.C. Jr. (ed.)^ T^ie Fabric of Geology. Reading,'Mass.: Addison 
Wesley, 1963, pp. 135-163. ' / ^ ' 

McCARTY, H. H. "Use of Certain Statistical Procedures in Geographical Analysis," 
.AAG, Vol. 46, 1956, p. 263.^- . . ^^'y'^^^Z^^ 
McCARTY, H. H. and N. E. SALISBURY.;K/5Wfl/ Comparison ofhopleth A^pfasa 
Means oj Detemiining Correlations 'Betfveeh, Spatially Distributed Phenom- 
ena^ (Monograph N^. 3). lov^ City: Department ofGeo/raphy, University of 
Iowa„1961. * ^ , V . 

PATTISON, W. D. '.'The Four Traditions of Geography jNS^.,,,;/^^^ of 

Geography, Vol. 63, 1964,pp. 21 1-216. ^ 
REYNOLDS, R. B. '^Statistical Methgds in Geographical Research,'' Geographical 
: /?eWew, Vol.46, 1956,.pp. 129- B2. T '"^V \ • 

SPATE, 0. H. K. "Quality and Quantity in GcTbgr^ky," AAG, >Vo^^50,• 
1960, pp. 377-^394. . '[ ' / ) 
THOMAS, E. N. "Some Comments About a Structure of Geography,With Pa^icular 
Reference t,o Geographic Facts, Spatial Distribution, and Areal Association," 
\r\ Koiin, C. F. (ed.). Selected Classroom Experiences High School^eog' 
}^phy Project Chicago National Council on Geographic'Education*, i964, 
, pV 44-60. , ' * / " 
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, SECTION .2' 



' , MODELS. IN GEOGRAPHIC, RESEARCH ' 

^As an activity, model building has becoAe an increasingly important aspect of 
geographic research, concommitani with 'de^blopmi^nts of a more technical nature 
outlined in the preceding section. Aldiough^a^e/s caabe defined irt many ways, a 
common feature is that they a^stra^ct from reality certain elements that are 
amenable to analysis. The way in wTiich this transformation is carried out, and 
, indeed the-choice of ^hich factors to include in the model, naturally* rest upon 
substantive^interpretatiorts made by the researcher Nvith respect to the problem in\ 

The process could -be carried out inX^^^a^it^tive manner, seeking to ex'press^ 
likely relationships between factors- on the basis of logical reasoning;' such an 
activity is lik^y to lead to a diagrammatic representation of the relationships, as a 
conceptual model. Altematively,jhe relationships could'b^ specified il) mathe- 
. n^aj|cally (leading to a jnathematical Inodel), 2) V^CQnstrucfiog a smaller or larger 
physicd representation* (scale "or iconic njodel), or 3) by transfer t,o another 
medium, for example substittiting electrical Implils^s for flow phenomena (analogue, 
model). A statistical model is particular ^form of mathematical model. It is a ' 
mathematical expression, with Variables, pardmeters, and constants, and in addition 
^ it contains one or more ^random components. 'In geography, statistical models are 
usually emphasized since much reWarch is. empirical in nature, and the random 
components are due in large part to sampling variability and measurement error. - 

The readings reflect two fnajor influences upoa the increased use of models in , 
geography. First/ the desires of researchers* to understand the linkages (hat exist 
, between elements of geographic systems are represented Jn a series of review articles ^ 
in a seminal volume edited by Chorley and Haggett (1967). The second influence' 
1 has emanated /rom outside the discipline, as other natural and social sciences 
I . overiap at the frontiers of^^ogra{)hic endeavor; see Ackerman (1963), the 
NAS/NRC publication The Science of Geography (1965), and Taaffe (1*970). In 
Edition, both Haggett (1965) anT King (1966) place modeh building approaches 
into a larger substantive 'frartetwork. Duncan, Cuzzort, and Duncan (19^1) wexg 
""among^tke first to point out many .problems in forming models using areally based 
' data, particularly the a'ggregjrtjpn factor (scale problem) and 'its effects on 
explanation atrdifferent levels of inquiry (Dogan and* Rokkan, 1969). 

Severy references in'Seclion 4 (Reference) are useful in this area: AckQff (1953, 
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SECTION 



GEOGRAPHIC THEQR 



le niost important aspect of changes in ^pgraphic.me'tliodology^has been a 
revitalized, interest in the role of theory in/fne di^pline. Although earlier debate 
over/determinism and possibilism (Lukemmnn, 19j^) can.be viewed as oiie example 
of prior concern over theory, the mejKods used tS test general pfbpositibns before ' 
the inid-195P*s we^eidiographic iaiheft orienta£i9;i. ScKaeffer*s (1953) 4enuncia- 
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„ tion of this underlying weakness laid the foundations for more cpncise methods of 
' explanatjon, but the development of statistical models in geography progressed at 
such*a rate that explanation based on purely methodological grounds was no longer 
acceptable. Harvey's (1969) major review of this process indicates^clearly that the* 
explicit desire for more adequate explanation is at the same time a concerted effort 
in the development of theory, even though the latter ma> be largely derivative^ the 
spatial equivalent of essentially' temporally-based theories in other natural and 
social sciences. ' . ^ * 

It seems^ that th^re are three major .fronts on which the difficult problems of 
theory x:onstructlon m geography'arc bemg broached. First, there is an extension of 
Schaeffer's contribntion with much greater emphasis on the philosophical founda- 
tion of the discipline. In particular, the dual-activity of explanation and prediction 
is Wing subject to much debate, see the important contributions by Harvey, Olsson 
(l969o, 1970), and Golledge and Amadeo (1968). A second approach is seen in the 
concern ove'r relations betwebn geography and geometry' (Bunge, 1962). In this, 
area, Nystuen (1963) has derived some basic elemerfts of geogtaphic space, while 
Tobier (1963) has identified the fundamental fole of map transformations. The 
third attack is basecLupon stochastic process models, this appears to have particular 
relevancd in physical geography (Curry, 1964, Scheidegger and Langljein, 1966). 

General » re fprences in Section 4 (Reference) which are relevant he're 'are 
Braithwaite (I960) and Glaser and Strauss (1 967). 
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REFERENCE 



^ Section 4A contains, geographic textbook references. Gregoryjs book (1968) 
originally appeared .at a time when cpunses in quantitative methods were being 



introduced into undergraduate curricula and there were no prerequisites for such* 
courses^ Today, most undergraduates are required to register for an introductory 
statistics course prior to training in ^ographic methods. King's Review of 
contribution! in this area (1969) needs sUch a background. Fran the point of view 
of student response, Krumbein and Gjaybill (1965) Nhas been foutid most 
satisfactory because it bridges the gap between methodological and substantive 
concerns in a comprelrensive banner. A different approach is seen in Veates* 
1 subject-matter oriented text (1968). ^ 
\, T>^ important collections of articles also are indexed in Section 4A. The Berry 
^ and Marble reader (1968) ^rings together a number of references, most of which 
have been published before. The two volumes oh Quantitative Geography, edited 
^ by Carrison and. Marble (1967), contain symposium articles which are referenced 
separately in the Bibliography. Also, as indicated earlier, relevant pa^es from 
Greg;ory, King, and Kn^mbein and Graybill are referenced at the end c^f sections. 

More gpneral works are Ijsted in the second p^rt. Section 4B. The 'following are * 
important as background references in statistical methods: Blalock (1960); 
Coleman (1964); Caoley a'nd Lohnes (1962, 1970); Freund (1967); Fryer (1966); 
Guenther (1965); Huntsberger (1961); Kendall an^ Stuart (1963, 1966, 1967); • 
Morrison ( 1 96 7); Seal ( 1964) ; Snedecor ( 1 956). 
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SECTION 5 . 
REVIEW ARTICLES 



{ * 

The articles in this section vary greatly in scope and purpose, but an attempt has 
been made to cover most areas of interest to geographers. Basic explanations of the 
use of statistlfcal methods in geomorphc4ogy, for example, are illustrated in the 
articles by fchorley (<1966) and Strahler (1954). ThesJe, plus the references by Barry 
(1963) and^art and Salisbury (1965) have been indexed by page number in 
relevant sections in the Bibliography. In terms of this listing of reviews, the range 
extends from such introductory statements to fairly advanced surveys and critiques, 
suchasCuny (1967), Strahler (1964), and King^( 1969). » 

AMOROCHO, J. and W. E. HART. ^'A Critique of Current Methods in Hydrological 
Systems Investigation," Transactions, American Geophysical Union, Vol. 45, 

1964, pp. 307-321. 
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USGPO, 1960. (Sec- on "Climatdogical Statistics,'' pp. 46-67). * 
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Society of America. Vol. 67, 1956, pp. 571-596, 
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SEE ALSO: Section 1 . Chorley ( 1965); G^u\d C1969). , 
^ Section 2. Haggett (1965); King ( W66). 



SECTI0li,6 
MEASUREMENT 



Sirtce statistics may be defined. 5s a branch of applied mathematics ih3i 
conceiUrates on the development of ^iocedures for describing and reasoning from 
observations, the ways, in" which such observations are recorded or coded are clearly • 
basic to any analysis. Indeed, different ways of assigning numbers or symbols to' 
specify variations in characteristics of a set of objects(the process of me^urement) 
implies that cfifferent procedures will be applicable to c£rtain si tuitions. 'The kind 
of measurement achieved is a functiori of the rulps urtler which the numbers or 
symbols were assigned, . and these operations, in turn,^derin6 ttd limit the 
manipulations permissible in handling the data. The nWgjMlations and. operations 
must be tlyse of the nunlerical stru^cture to which the nies^orement is isomorphic. 

Four mca^rement scales are recognized.. nominal, ordinal, interval, and ratio. A * 
hierarchy of*levels of measurement is represented here. Nominal scales are the 
N^'eake^t type of measurement and, in a sense, render least information about the 
observations, whereas the ratio Scale is the highest level of measurement, 
isomorphic to the numerical struct^ie^of anthmetic. In addition, each higher level 
of measurcnient incorporates aU tSji^t'tnbutes of lower levels, for example", ordinal 
scales have all the charactenstics of nominal scales (equivalence relations among 
members .of a designated sub-class) as well as their own properties ("greater than** 
or "less than** relations). ^ 

In the nominal or clhssificatory scale, the assignment process is carried out with , 
the purpose of designating sub-classes which represent unique characteristics. 
Ordinal scales (ranking) have the additional purpose of identifying ordered relations " 
of some charactenstic. The order itself has unspecified intervals, i.e., the magnitude 
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of differences between ordwd categories cannot be esfablished. In interval scajes,^. 
the ordered relations refer, to values and are based oy\ arbitrarily assi^e(d^&ic^,e(jual 
intervals, but with "an* arbitrary zero point (e.g. temperatura on Fahrenheit "or^ 
Centigrade scales). Thi^ the (jpcus is on ti' "Terences between values on 3= scale, 
and the normal rules tj arithmetic can bt appli^ to such mtervals. Finally, the 
ratio scale incorporates ^1 other properties of lower levels, and has an absolute zero 
point, permitting all arithmetic manipulations: 

Some scales appear to be intermediate between these four types. One exainple. 
might be the Likert scale, which is an attitudinal measure, eliciting responses to a 
question on th'e basis of/ five to seven classes, from "strongly disagree** to *^strofigfy 
^ee.** In this case, an ordering has been imposed .on wh^t is pr^Jbably. an , 
underlying continuous variable, so that the scale seems fb lie between^Ordinal;and ^ 
— inkrval- scales as defined ^abbve. Indeed, many problems in geographic research 
using a b^ehavioral approach are linked to the scalipg question. Gould (1969) and 
^ushton (19655^scuss te,<:hniques^for the conve'*r5ion of non-metric infonnation to 
a metric base. Other readings .(for example, Hodge, 19^3, and K^mibein, 1958) 
clearly indicate the implications of different scaTe^Jjo^^ubsequent analysis. The use 
of dimensionless measures is described by Schumm (195^). 
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STRAHLER, A. N. "DinriensiOnal Analysis Applied to Fluvially Eroded Land- 
forms,*' Bulletin of the Ceologipal Society of 'America, Vol. 69, 1958, pp^ 
279-300. ' , ' 

SEE ALSO: Section 1. Coppock and Johnson (1962); Garrison (1956). ' 
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REFERENCE: Section 4A. King (8-13; 230-231); Krumbein and Graybill (34- 
• ■ 38; 44-53).^ 

Section 5. Chorley (285-292). 

SECTION 7 ' , ; • 

/ 

, SET THEORY. 

Set theory is an integral part of finite mathernatics and can be used as a 
foiindatibn for the developjnent of concepts appropriate to probability theory. If 
this material is not covered in prerequisite courses, for the student, it can be 
introduced in an informative context as the logical basis of the regionalization 
problem^ geography (see Golledge and Amadeo, 1966). References cited in 
Section 4B dealing with finite^ mathfematics (especially IQemeny, Snell, and 
Thompson, 1965, and, Robinson,' 1969) and probability theory (especially 
Goldberg, 1960, and Mosteller, Rourke, and Thomas, 1961) exhibit the transition 
from s,^t theory to probability concepts in an excellent manner for those geography 
courses in which it is necessary to establish the bases of all statistical reasoning, 

COLE, J. P. Set Theory and Geography, (Bulletin of Quantitative Data for 

Geographers No. 2). Nottingham, U.K.: Department of Geography, 
' Nottingham University, 1966* ^ 
GOLLEDGE, and D. M. AMADEO. "Some Introductory Notes oh Re^onal 

Division and 'Set Theory/* The Professional Geographer^ Vol. 18, 1966, j;lp. 

14-19. , 
HAMILL, L. "A Note. on Tree Diagrams, Set Theory and Symbolic Logic,'' The 

Professional Geographer^ Vol. 18, 1966, pp. 224-226. 
SHEAR, J. A. "A Set-Theoretic View of the Koppen Dry CWmsiies,'' Annals, AAG, 

Vol.56, 1966, pp. 508-515. ^ * 

WARNTZj W. "Some ' Elementary and Literal Notions About Geographical 
• ' Regionaliza'lion and Extended Verm Diagrams," in Bunge, W. et ai, the 
^ ^Philosophy of Maps. (Michigan Inter-Universiiy Community of Mathematical 

Geographers. Discussion Paper No. l2). Ann Arbor, Mich.: Department of 

Geography, University of Michigan, 1968, pp. 7-30. , 

^ Section a t . - 

e GEOGRAPHIC t)ATA 

S^j)f observations on variables of particular interest to geographers are ordered 
in a manner pQ'^Uliar to . the discipline, any unit of observation not only has a value 
on the particular attribute in question and a temporal characteristic bufklso has the 

16 



property of geographic location. This is usually expressed as the geographers* 
primary concern with maps, but the exact specification of the locational ^ 
coordinates presents some difficulties (Tobler, 1963). , ' 

Ideally, data should be referenced by point locations but practically, some 
ag^egation factor is appUed to produqe-a set of quadrats (cells). Since aggregation 
is a continuous functif^nd statements made at different scales of aggregation can 
differ, a basic problem in geograpliic research, is revealed. The use of point 
.. locations-the process of "geocodihg"' (Tomlinson, l967)-in recent census, 
operations has opened up new avenues of research (see the excellent series^of 
publications by the U.S. Bureau of the Census). Forbes and Robertson (1967) V 
report on the use of quadrats in census operations. A more general approach is 
given by Haggett, Chorley, and Stod4art (1965). : " 

^ Ce nsus bureaus are th6 primary sources of information used in most economic- 

* urban geographic research. It js therefore very important to lihderstand the 
ratibiiale behind the types of infonaation collected (Fay. and Klove*", 1970), since 
these data may^^e^TOcd^ approximations (operational dBfinilions> of conceptual 
factors. Aggregations of unit charactenstics, su<^ as occupational groupings or 
industrial classifications, shoujd be considered,(Sweet, 1970) as well as the problem 
of using census-defined unit Sreas (see Morrill, 1969, and Abler, 1970)'. Potentially 
important new sources of information are being,developied using airborne sensors 
(Cooke and Harris, 1970; Moore and Wellar, 1969). 

The greater availability of all type* of information amenable to spatial analysis 

• has forced the geographer to consider seriously all aspects of collection, storage, 
and manipulation^pf data in a systems framework, using the computer (Section 10). • 
The, seriQj of .reports by Dueker (1966) addresses this area of overall data 
organization. ^* , ' 

ABLER, R. "Zip-Code Areas as Statistical Reborn '"The Professional Geq^pher, 
• Vol. 22, 1976, pp. 270-274. 

COOKE, R. U. and D. R. HARRIS. "Remote Sensing of the Terrestrial . 
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DUEKER, K. J. Spatial Data Systems: Organization of Spatial Data. (Urban and 
Transportation Information Systems, Te.chnical ReportKo. 4). Evanston, 111.: 
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tion Systems, /Technical F^eport No. 6). Evanston, 111.: Department of ' 
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FAY, W. T. and R. C. KLOVE. "The 1970 Census," The Professional Geographer, 
Vol. 22,1970, pp. 284-289.-^ / . * 

FORBES, J. and I. M. L. ROBERXSON. ''Population Enume^tion on a Grid. 
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/ouma/, Vol.4, 1967, pp. 29-3?! V 
HAGGETT, P., K. J. CHORLEY, and D. R. STOfiPART. "Scale Standards ayid 
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.205, 1965, pp. 844-847. ' ^ *^ 
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U.S. BUREAU of the CENSUS. Census Use Study: Family Health Survey. (Keppti 

No. 6). Washington, D.C.: USGPO, 1969^1 pp. ' ^ ^ 

U.S. BUREAU of the CENSUS. Census Use Study: Health Information System, 
" ' (Report No. 7). Washington, D.C.; USGPp, cf969, 67 pp. . ^ ' ' " 
U^. BUREAU of the CENSUS. Census Use Study: Data Uses in Urban Planning 

\ (Report No. 9). Washington, D.C.: USGPd, 1970, 28 pp. ^ 
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System., Federal Activities and Specialized Programs. Kent, Ohio. Kent State^ 
University, 1969, PP. 25^-277. . * ' . " ' . , . 

•REFERENCE: Seclion 4A. Kruoibein and Gr^bill (33-3-4). ' ' ■ ' 
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THE GEOGRAPHIC. MATRIX • ' * 

. ' " ■ ' V^^.' ' ■ .... 

Once 4he infomiation required for any problem has been collated, how can one""* 
organize it .most effectively for analytical purposes? A matrix" forniatiwi^mmonfy ..^ 
empl9yed, Especially since the data is most ^efficiently store^J in this>fOrra for ^ > 
computer analysis^ (Section 10). Two types 'of geographic data,' \yith different 
matrix forms, can *be recognized: an attribute^ matrix, of orier w-plages by ' 
m-attributes or characteristics of the places; ^nd an interaction inatFj|c,^of .or36r « 
/j-pl^ces/by p-places.^ At'trib^te matrices are most commofiry, employed in. 'the ''^ 
literature since they refer to easily available data in comparison to the int|r-place • 
flows (of information, goods, ntoney, pecyjle, etc.) iiidexed by the interaction 
matrix'(Smith, 1970).- » . ^ " . \'' x 

Many of the regular ^ta'tistical procedures can be related effectively Jo this 
^ i • matrix organization, for example, descriptive statistics (Sectiof\'12)'can'be viewed ' 
*^ * *as the outcome of manipulations on any one of the columns of the matnx, while . ' 

correlation (Section .21) is 'the covariance of two' or mpre 'columns. Since ^ny " , 
column is a means in addition to the map of expressing a spatial (iistributicin, the 
'relevance ofa matrix format is evident. Similarly, the grouping of rows of tfie matrix ^ 
can be likened to the process of classification/regionalization'(Seotiori 32). ♦ : 

Apart from this pperational utility, the' matrix apprOacli has Certain benefits of a 
. .conceptual nature. For example. Berry (\%4) w^able to categorize mucli of the • 
work in regional, geography using a matrix organization, a later- extension- o{J this 
argument called for a synthesis of formal and functiondi re^nalizatioh jcheWes I 
(6erry, 1968). The conceptual framework employed in tlvis latter case was field : 
theory, originally develolped in psychology. S^ome difficulties in transfatihg these \ ^ ^ 
ideas in a geographic coptext are outlined by Greer:Wootten( 1971). ' / 

The readjngs also include a reference to the uSe of matrices from an analyucal 
point of view (Gould, 1967), althougli matrix algebra itself is a foundation |flror / ; ^ 
^much .work in multivariate statistics (Section 24). N^t(i also that Haggett and 
Chorley (1967) have proposed-an alternative form oPthe geographic matrix to 
make it^ore relevant Tor model building the axes of their matrix^ are locationaF ' ^' 
relativity and topological-geometric focm. - I 

BERRY, B.. J. L. "yj^pproaches to Regional Analysis, A Synthesis,'' Anliais, -AAT^ V** 
^ VoL 54, 1964, pp. 2^ 11.' ■ . U ] 
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i ' SpatianAnalysis. Englewood Chffs, N.J.: Pcehtice-Hall, 1968, pp. 419-428. 
GOULD, P. R. *'0n the Geographic Interpretation of Eigenvalues/* Transactions of 

the Institute of British Geographers, No. 42, 1967,- pp. 63-86. 
GREER-WOOTTEN, B. "Some Reflections on Systems Analyesis in Geographic 
. Research," in French, H. M. and J-B. Racine (QdstX^ Quantitative and 
vtte Qualitative Geography. La Necessite d*un Dialogue. (Occasional Papers, 

. Department of Geography, No. 1). bttawa; University of Ottawa Press,4971 , 
" pp. 151-174. . * ^ . * 

HACGETT, P. and R. J. CHORLEY. "Models, Paradigms and the New Geography,*' 
' ^ • * iniChorley, R.J. and P. Haggeft (eds.). Models in Geography. London: 
'Methuen, t967,'pp? 19-41. 
SMITH, R. H. t. **Cfccepts and Methods iji Coihmodity^Flow Analysis,** Economic 
. • Geography, m46, 1970, pp. 404-416.* 

TEITZ, M. B. "Land Use Data Collection Systems: Some Problems of Unification/* 
Papers of the Regional ^cience Associations Vol. 17, 1966, pp» 1^9-194.^ 

REFERENCE: Section 4A. Krumbein and Graybill (53-56). 



SECTION 10 
COMPUTER APPLICATIONS . 



The uses of punphed cards as a means of .recording and storing data are well 
known (Melton, 1958). Benefits are evident in t^mis of pemianence and^ 
,^producibility of original records and for speed, neatness,^ flexibility, and accuracy 
in manipulations. .The general utility of computers in geographic research is 
documemeiby Pitts(1962) Kao(1963), and Gould < 1970), while Haggett(1969) 
(fescribeS some of the implipation5» for research. The effects on ordering infonnation 
(Section 8) have already been noted (see also Hagerstrand, 1967, and Nordbeck, 
1962). , , 

There are two major apprfecations of computers in geography: ^) as an aid in the 
ra)|$id portrayal of informatio^in the use of the* on-line printer for mapping 
puiposes,Vof exa^hple, sucTi as the SYMAP system (Roang, 1969;Massey^ 1970; 
Douglas, 1971) 6r the incremental plotter for line patterns (Kem and Rushtcn, 
. 1969;.MonmPnie.r, 1970jL; aijd 2) for analytical purposes-in correlation studies^ for 
example (Monmonier^ 1*97 1). In the latter case, it is worth noting^hat much of the 
.current research usiijgmjultivariate techniques would not be possible without acqess' 
to high-speed computers- for exaipple, factor-analytic studies of large data matrices 
([Sections 24, 25, 26) and the lengthy iteratWe process of classification (Section 32^< 
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Trend surface analysis (Sgption 30) shows tht direct link between analytical and 
. graphic uses of the <;(Dm|)uU ' ' 

Three departments of geography have produced useful sets of J^puter 
programs: Iowa (Wittick,, 1968), Northwestern (Marble, 1967), anMTichigan 
(Tobler, 1970). In addition, there are two sources of on-going research and 
^ p^blication: the State Geological Survey associated with the University of Kansas 
(Computer Contfjibutiom), and the Department of Geography of the University ^o'f 
Nottinghajn, U.K. {Computer Applications in the Natural and Social Sciences). 

In Section 4B, McCracken (1965) ^d Organick (1966) pre'sent the basic 
.elements of PORTRAIT, programming, while Veldpian (1967) lists many programs 
relevant fcripultivJiri^te^alysis. ' ^ - 

CRAIG, W. J. ,**A , Computer Function to Measure Distances," The 

^ Professional Ge^pher, Vol.23, 1971, pp. 157-159. 
DOUGLAS, D. H. "Map MakiAg with the Electronic Digital Computer," in French, 
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I^ecessite d*un Dialogue. (Occasional Papers, Department of Geography, No. 
/ 1). Ottawa, €aiiada: University of Ottawa Press, 1971, .pp. 97-114. 
FAiMAN,^M. andj^. 'NIEVERGELT (eds.). Pertinent Concepts in Compuier 

Grap^cs^rbmijJ&j,^,:^n\^^ of Illinois Press^ 1969. 
GOSENreLb^J^l/.a^d-t^^^^ "A Computer.Based* Approach to Plaiming in 

'\ *Uftderdwpl]?^d Ar^," ne Professional Geographer, Vol. 'l7,'1965, pp. 
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SEE ALSO: Section 8. Hofsten (1966): Tomlinson ( 1967). 
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SECTION 11 

SAMPLE DESIGNS AND METHODS 



The application of statistical methods in ^geography is intimately related to 
sampUng* theory, which deals with the problem of estimating cha^ac^^ristics of a 
population from values obtained from measurement of a sample. A t)opuIation, or 
univferse, is any class of objects or events arbitrarily, defined on the basis of its 
unique and* observable characteristics, while a sample is a collection of objects 
selected to represent a population. In certain field situations, and particularly^in 
geomorphology, the conceptual population may not be available for saniplmg, . 
therefore a target population must be defined (Krumbein, 1.960; Chorley, 1966; 
Griffiths and Ondrick, 1968), In such cases, it is cley that any statements made, 
about population characteristic (parameters) from summary sample values (statis- 
tics) will be affected by the degree of correspondence between targej^^^and, 
conceptual populations, ' * , 

In contrast, the situation in most economic-urban geographic research that 
uses areas (usually defined by the census) as sampling units, is one ^n which little, if 
any, selection proccfe is applied. The conceptual population is then argued to be all 
^uch sets of areas, at such a stage of economi§ development, etc., that might have 
existed in the past or would exist in the/uture, as well as those existing in the 
present! if the sampling unit is defined as a coordinate point location or a quadrat^ 
(or similar geometric unit-see Matern^ i960; Holmes, 1970), the population 
characteristic of interest is usually some continuous feature such as land use, and 
the procedures of areal or location sampling arp appropriate (Berry and Baker, 
' 1968;lladneld and Oorzeske, 1966; Holmes, 1967). 

Sampling is usually carried out with two purposes 'in mind. 1) to estimate 
population parameters, and 2) to test a statistical hypothesis about the population, , 
The standard error is used in computations for both goals and is thus an important 
* ^ concept. Imagine that a sample of quadrats is selected for measuring some attribute, 
say the proportion of land under wheat. From, the resulting set of proportions, a 
„ sample distribution can be drawn up, usually graphically in the form of a frequency 

diagraip or histogram (Section 12). An indication of the average proportionofclai^j^ 
under wheat can be obtained by computing the arithmei/c meanot^ttie set of 
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observations (Section 12). The question that then arises is. to what extent does the 
sample mean (a statistic) differ horn the true population- mean (parameter)'* 
Sampling theory provides an answer to this question by the following' argument, 
the sin^e sample taken is but one of the theoretically infinite number of such . 
samples, and for each* of these samples an arithmetic mean coGld also have been 
computed. The complete set of sample means could be arranged in a frequency 
diagram, resulting in what is known as a sampling distribution. The standard 
deviation (Section 12) of a sampling distribution is known as the standard error. 

If the frequencies are viewed as probabilities of occurrence of values of the 
sample mean, it is known that the shape of the sampling distnbution has a 
p^'cular form-the normal curve, a continuous probability function (Section 15). 
Indeed, many sample statistics have a normal distribution even t^iough they may be 
derived frc»n parent populations that are not normally distnbuted. Herein lies the 
value of a coPtectly conceived and executed sample, since the full power of 
parameinc statistics can then be applied. For example, it is possible to place 
confidence limits around the particular sample mean obtained, so that, at a 
specified level of probability, the researcher may state that the 'true population 
mean lies in such an interval. It seems that few geographers have utilized sampling 
theory to the fullest in making estimates of population parameters. Rushton( 1966) 
illustrates the procedures. 

The selection procedures used to draw a sample are extremely important. Most 
statistical tests require that the sampling units were onginally assigned some^ 
probability of selection-the general caie of probability sampling- If these ' 
probabilities of selection were eqi^M for all units, randwn sampling is employed. 
Non-probability sampling (purposive or quota samples) may be necessary in certain 
research areas in which there is virtually no knowledge of population character- 
istics, and some non-parametric tests may be applicable in such situations (see 
Section 1^). 

Sample designs can be viewed as sets of selection procedures, following certain 
mles (sampling plan) established Wy the researcher. For example, if it is clear that a 
population charactenstic, such as slope angle, varies directly with a known factor, 
such as Uthology, then stratification can. be applied (Wjo.odr 1955). The sampling 
fraction may ,or may not be e^qual for the different strata (a function of 
Avithiin-strata variability), but the design can only help ih reduce the standard error 
of sample statistics, ^e.to increase the precision. Note that the other prunary factor 
influencing precisicm, the size of the sample, would also be decided upon as an 
integral part of ll^S&mple design, usually as a cost factor. Furthermore, in order to 
estimate the pOfiiSition mean in this case, a weighted average of 'the strata sample 
means would have^to be computed. The lesson is that ea^'h sample design, and there 
are many possibilities for invention m this respect, will require different means of 
estimating poputatfon parameters. 

The major use of sampling in geographic research has been to test, hypotheses 
(see Section 14), and this goal clearly places sample design iptc a larger context. 
The possibility of using certain tfests depends intimately on selection procedures 
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used and on certain assumptions about the nature of the parent population (torn 
which the sample was drawn it is clear that sample design and stausticaJ design (the 
type of anaiysis to be carried out) should be uicorporated into an expenmentaJ 
.'' design yiat ensures that the ultimate objectives of the research will be attained 

(Knimbein and Miller, 1953; Krumbein, 1955, Haggett. 1964). 
J it seems that there are three mam situations in which sampbng is particularly 
relevant in geogidphic research. First, many studies in economic and urban 
geography (in which sampling units are discrete entities such as manufacturing plants 
or households) can draw upon the substantial hterature of samphng theory 
developed by sociologists or economists. The use of standard references, ^ch as 
Yates (I960) or Kish ( 1965). paying particular atienuon to areally-based sampling 
designs, appears to suffice in most cases. An intercsung design, illustraung the 
strong relationship between' problem specification and subsequent stages of 
research, is seen in the work of Hess, Riedel, and Fitzpatnck (1961). Most designs 
in this area would be randomized. 

In contrast, when dealing with continuously distributed phenomena, such as 
land use, spaUal sampling plans are called for. Berry and Baker (1968) have shown 
that ^ sy^^f^matic type of design (based on agnd scheme, but not regular in terms of 
point locations within a cell) is the most efficient in such circumstances. In 
sampling from spatially conUnuous phenomena, a problem exisSin that the values 
attributed fO any one sample point are related to those of all surrounding points 
(tfie spatial autocorrelation factor, see Section 29). Holmes (1970) indicates clearly 
how this affects the failure of random selection procedures m location sampling. 
Line sampling (the u^e of traverses) is also employed in some cases (Latham, 1963, 
Haggett andBoardJ964). , . . 

Finally, we mr^t note that sampling procedures used in geomorphology 
represent perhaps tl^e highest degree of attainment in terms of specific designs 
related to specific problem areas. Griffiths and Ondnck (1.968) illustrate the 
different approaches, as well* as introducing methods of leUmg^hether a sample 
distribution approximate an assumed parent population distribution (Section 
15). Sole that the population dislnbutions of many geomorphic charactenstics are 
known (see especially Krumbein aid GraybilK I965j. Researchers in this area have 
also investigated extensively th€j=.lproblem of operator variance (errors due to 
. different ' researchers m^asurin^ t^ie same phenomena) in ^xpenmental settings 
(Gnffilhs and Rosenfeld, 1954). Chorley (1958) has described this effect in 
mprphometric analysis; 

General texts of value in sampling design and rnethods are listed in Section 4B. 
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SECTION \T I 
DESCRIPTIVE STATISTICS 



Some Overall characteristics of a body of data can be represented by summary 
measures, which are based upon manipulations of values of one attribute. I^or this 
reason, these measures are also called univariate , statistics, if the measurement 
procedure resulted' in nominal or ordinal scales, tabular frequency counts are 
usually made which can be represented graphically (bar-diagrams, pictographs, etc.). 
With interval or rjrtio scales there is a continuum of such values so that itjs 
necessary to form^oups or classes to make up a frequency diagram or histogram. 
Absolute or relative frequencies can be represented, and these values can also be 
summed across the classes to form cumulative frequency distributions, or ogjves. 

Two main types of summary measures are calculated. 1) Mefa^ures of central 
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tendency, which describe the clustering of values about certain points, are the^fixst, 
type. For ncnninally scaled data, the modal class (cpnt^Jning most qbser>4tiom) can 
be obtained by inspection. The mode is th^most frequent value^In ranked d^ta, 
the median is an appropriate measure; it is the middle ranked value, so t^at exactly 
half the observations are located above or bejow this point. The arithmetic mean 
(sum of values divided by tbe number of observations) is a" balancing point on the 
continuum of valu^ for interval or ratio scales. 2) Measures of dispersion, which 
describe the spread of observations about the central value, 3re the second type. 
The laiige indicates' the c(nnplete span, being the difference between maximum and 
minimum valjies. Deviations of individual values from the mean are used to form 
the most important measures of. distributions. If fliese deviations aige summed the 
result wil/naturally be zero. This sum of deviations is known; as the first, moment , 
about the mean of a distribution (pLx «.2(Xi - X)VN = 0). The second moment 0^2 
= 2(Xi ^ X)^/N) is the meai> of the squared deviations about the mean, and is 
defined as the variance. The square root of this quantity is the' standard dcviaticMi, 

Many empirical frequency distributions have similar characteristics, so that we 
may refer any s^ple distribution to scsne theoretical population distribution. The 
normal distrftution is the most important population distribution in statistical v' 
analysis; and in this case, for example, the mode, median, and mean have the same 
location. Departures from the symmetrical (about the mean) bell-shaped curve in 
' terms of the shape of the distribution are called dtfgrees of skewness. For example, - 
if the median is greater than the mean, the distributi9n is ^d,to be negatively 
skewed. Skewness mzy be measured by the third moment about the mean, whfie 
the fourth moment indexes kurtosis— the degree to which .the distribution *is 
strongly peaked (leptokurtic) or relatively flat (platykurtic) ccwripared to the 
normal. * . «• ' 

A direct link with cartography is seen in the graphic represelfatation of data 
values, and several references deal with the problem* of specifying the most efficient 
class interval (for example, Amistrong, 1969; Jenks, 1963;Mackay,'l955;Scripter, 
1970). Mapping can also be carried out in terms of clais intervals derived from the 
standard deviation of the sample distribution (Yeates, 1965): The use 'o£ standard 
descriptive statistics in climatology is seen in the articles by Portig (1965) and , 
Sumner (1953), while a measure of relative variability for different attributes (the 
coefficient of variation) is employed by Fuchs ( 1960). ' ' 

Cumulative frequency distributions are illustrated in Berry's (1961) studj^ of the 
rank-size rule. Alexandersson (1956) also used^^this typ^ .of graphic reptesentktion in 
an early study/of cjty classificatiqis (see alsoMorrissctC 1958). Some bf the indices 
derived in desbriplive studies (Section 13).aiTe basejd on deviations from tfie mean, 
and' similar measures have been einployeiin classifications (e.g.^Nelson, 19^5). 
Note that the validity of this approach depends on* the sample distribution having 
characteristics of a noimal cyrve. Descriptive measures are also computed^ in 
morphometric analysis (Clarke, 1966). . 

jPinally, it should be noted that these summary statistics treit the set&of values ' 
in a line?ir fashion, disregarding the locative elements. Much of >he conventional 
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statistical analysis, however, can be faulted in the same manner. Tobler (1966) 
discusses the problems resulting from the two-dimensional aspect of geographic 
data when measures refer more -properly to one dimeiision (such as time, for 
example). Some research has treatedjhe problem of descriptive statistics for ajeal 
distributions (Section 19). ^ 
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: INDEX CONSTRUCTION 

Many descriptive indices have been denyed in geographic research in '^rder to 
summarize particular facets of a problem. Some are* related to the {Procedures 
described in the preceding section; Lewis* (1966) level-of-living index is 'Based^ipon 
the mean and standard deviation, for example, while McEvoy (1968) used inking 

'^techniques. One Of the better known indices waswcomputed by Weaver (cited*in 
Section 12) to describe the mixing of crop types in counties. J t is also based on 
lanfang and a comparison : of actual proportions with ideal sets, by means of 
deviation scores. The method can be applied to any situation where relative 
proportions are organized in discrete groups; for example, industry groups (Johnson 
and Teufner,* 1968)^ It has generated much criticism, however, bpth on substantive» 
.an4methodological^ounds(see Hoag, 1969; L. J. Johnson, 1969)., 

Some indices attempt to compare a region's share in some attribute; for 
example, employment in a type of manufacturing industry' with its share of some 
basic aggregate. Location quotients and various coefficients of localization attempt' 
to describe this concept (Britton, 1965). Similar reasom^g applies to several indices 
of segregalion developed in social geiography (Clarice, f97i>J[imms, 19,65: Duncan 
and Duncan, 1955). Cumulative frequency distributions jrfe used to define Lorenz 
curves for computing indices of diversification or concentration (Beny, 1959;^ 
(fonkling, 1963; Kuklinski, 1965; Krumme, 19iS9; I. Johnson, 1967). Many 
problem areas require indices that havje not been employed before. One example of ^ 

* this is Wong's (1969) Qoefficient of choice-perception. * / 

A* major difficulty with tKese descriptive nicasures is^arfhey^dften do not lead 
anywhere, they are specific to a problem and cannot be'generalized. Methodologi- 
cally, they suffer because the sampling distributions of such measures are not 
known; it^ therefore is not possible to associate' any probability statements with 
particular values. Two exceptions to this rule exist. Firsts the Gfni coefficient, 
derived from Lorenz curve analysis, is r^lat^d to a statistic that has a normal 
distribution (see King, ^ p. 115).^Second, a measure of regional homogeneity, 
originally produced by Sherr (196^ has been reworked by Radhakrishna and 
Subba (1968) who show th^ it is^relafed tQ^'the chl-square distribution (Section 
16). • * ^ ^ ^ ' 

In the list of references, some Work on 'operational definitions of drfticull^' 
concepts is also presented; for example, discussions of measures of shape (Boyce 
and Clark, 1964; Lee and Sallee, 1970). and of the definition of drainage basin axis 
(Abrahams, 1970; 04^y, 1968). 
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TfiSTING^STATISTICAL HYPOTHESES: 
^ GENERAL CONSIDERATIONS 

" In order to reach an objective decision Sbout j'esearch hypotheses, a, set of rules ^ 
is* needed. The decision will be b^sed on the outcomes of tjie sample and the risks^ 



the investigator is willing to take, in making an incorrect, decision. The particular 
sample of values is, of course, only one of a theoretically infinite number of 
samples of the same size from the same population. The complete set comprises the 
sample space, and in testing situations, this space is partitioned into two regions: a 
region of acceptance, and a region of rejection (or critical region). The decision to 
accept or rejfect is always made with respect to the null hypothesis (Hq: the. 
hypothesis of "no differences**). If it is rejected, the alternate Jiypothesis (Hi) may 
be accepted. Hi niay be regarded as the operational statement of the research 
hypothesis.-' ' 
, Two types of error are possible for the^decision taken. If Hq is actually true and 
it i^ rejected, then a Type 1 error has been made. This is generally referred to as a, 
the^ coefficient of risk, significance level or size of thtf critical region. A Type 11^ 
ertor (fi) occurs when Hq is accepted when in fact it 4s false'. The power of a test is 
defined as\|--^^^e probability of rejecting Hq when it is false. Obviously, the* 
Type Il^errorsfiould be minimized as much as possible. The power is rejated to the 
type of statistical test chosen, but generally it increases with a largej sample size. 
Alternatively, the size of, the sample c^n be determined by a consideration of the 
desired power of subsequent tests, indicating once againlhe linkages between all 
stages of experimental design. * ^ 

The rather formal procedures for carrying out tests of hypotheses can be^isted 
ill a number of stages: * 

(1) state the null hypothesis (H©), and the alternate hypothesis; , 

(2) choo«e a $tatistiCi^ test of Hq , stating the assumptions of using the related 
statistical model; . ' . ^ 

(3) ' specify a significance level (a); ' ^ 

(4) establish the relevant test statistic^ which involves "finding the appropriate 
• . sajnpling distribution; ^ ' ^ ^ ' ' 

(5) ^erine the regibn- of rejection; , ^ 

(6) compete the value of the test statistic, using the sample values. If the computed 
value is greater than the tabulated value for the degrees of freedom associated 
with the test (a concept related to sample size), then the null hypothesis can be 
rejected. One can then make inferences about the nature of the relationships 
implie4 in the research hypothesis for* the population, keeping in mind the 
chosen significance teyel. Note that any inferences must be based on 
substantive reasoning with respect to the theoretical framework of the 
problem. 

Jn addition to the normal distribution, there are three sampling distributioiw 
(based on samples from the normal density) which are used primarily in test 
situations: (Student*s) t, (Fisher*s) F, and the chi-square (x^) distribution. The 
f'distribution is used in tests where sample means for two regions (or two subsets of 
unit areas, or two ]groups in a sample of households, etc.) are being compared., Jhe 
null hypothesis in such a case would bt stated as Ho: /ij =/i2; i.e. there is no 
difference between the population means in the two regions, ^he null hypothesis is 
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always stated in terms of population parametlrs. This test will liave little value if 
the variances in the two regions are significantly different, since extceme values 
seriously affect the effectiveness of the arithmetic mean as a measure of central 
tendency. Usually, therefore, one tests for differences in the variances of the 
regions first of all, and the ^-distribution is used in this case. 

Examples of the application of the^ tests m iht literature are given in Mackay 
(1967), Rushton (1966), Rushton, Gofledge, and Glark (1967), and Swan (1970). 
The importance of a sound sampling basis for these procedures and the overall 
relevance of statistical testing in geographic research are underlined by Gould's 
article (1970). Other specific examples of'tests are given in Sections 16 and 17. The 
procedures outlined here are applicable, however, for any work in which questions 
of significance (differences, relationships, efc.) are involved, as in correlation or 
regression studies (Sections 21 and 27). 
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SECTION 15 ■ 

TESTING THE FORAt OF DISTRIBUTIONS 

>4 indicated in the section on sampling, for niany statistical tests ibe sample^ 
must have been drawn at random from a population with a known probability 
distribution, often the normal distribution. A number of procedures are available 
that test whether Ihe set of sample values c^tamed approximates an expected or 
theoretical distribution. Some of these are basically graphic and descnptive, they 
are referenced m this section, although a fnore fonnal statement of tests of 
hypotheses was presented in Sectional 4. One example of graphical methods is seen 
m extreme value analysis, especially important m climatology and hydrology 
(Gumbef, 1967> Court 1953, Dury, 1964) where "log-GumbeP probabdity paper is 
used (see also Gumbel, 1 958 in Section 4B). 

^ Tests of normality are r^ost ^ommon m the hterature. Graphical procedures are 
also involved here, anthmetic or loganthmic probabihty paper is employed, and 
cumulative relative frequencies are plotted against class mid points. The closer the-, 
resulting set of pomts to a straight line, the closer the distnbution-is to normal 
(Strahler, 1954, Chorley, 1966). An improved version of this method is to 
construct a fractile diagram, which essentially places a confidence band around the 
theofjtical straight line (Thomas, 1967, Gibson, f970, Tiedemann, 1968), Other 

^^ests-are numencal, based on the concepts of statistical inference (see Section 14). 
A comnron test relates the observed frequencies per class with those expected if the 
sample was drawn from a normal distnbution. The theoretical frequencies are 
obtained from tabulated probabilities of a standard normal curvcT (with zero mean 

^nd unit variance), and the relevant test statistic is chi-square (Maxwell, 1%'f). 
Note that all \hese methods depend on solne pnor defimtion of.class intervals 
usually an arbitrary process. Attest that circumvents this difiiculty has,b,een denved 
by Snedecor> it uses moment' measures of the distnbution and is based on the' 
t-statistic (Section 14). Berry and Tennant (1965) dlustrate the use of this test. 
Note that if the results of tests indicate that the values are not drawn from a normal 
population, the researcher may choose to transfonn the data (commonly Rising 
logarithms) or to use non-parametric methods of analysis (Tanner, 1959), 

Other references cited in ihis section are concerned with tests of the 
goodness^f fk sample distributions with^those expected from some theoretical 
model- for example, interactance models (Mackay, 1958). The distribution of 



distinces moved in mig/ation, or distances separatmg mamage partners, are also 
expected to take on cettam forms, and samples can be tested (Mornll and Pitts, 
1967; Moore, 1971). Dacey (1964) tested the outcomes of a probability rdodel. 
using a modified fprm of th§,Poisson (fistribution, a discrete probability function, 
to generate expected frequencies of points per quadrat (see also Section 20). 
Finally, we might note that it is not uncommon thai a given set of sample values 
approximates closely several related population distnbutions. Quandt (1964) has 
addressed the problem thai then arises, how to choose the best fitting distribution. 
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SECTION 16 



OTHER CHI-SQUAI^E TESTS 

The chi-square statistic is often, used in testing the outcomes, of theoretical 
models where observed and expected frequencies are involved. Besides fitting 
samples to theoretical distributions (^ described in Section 1 5), probability models 
can also be tested in this manner (Brush and Gauthier, 1968). If the F-test of *the 
equality of variances in two regions, for example^ is extended to the c^se of k 
regions or groups, a chi-square statistic is involved. The test is known as Bartlett*s 
test of the homogeneity of variance, and is a necessary assunJ^n iit the analysis of 
variance (Section 17) and covariance (Section^8).''The artiSby Maxwell (1967) 
and King (1961) include this test. „ \ 

More commonly, chi-square tests are^used in non-parametric situations (Section 
18) when data are measured on a^nominaLscale. Two (6r more) classifications of 
unit areas are thus involved, and the test seeks to establish whether or not the 
classifications are independent. The first step is to establish a contingency table (of 
r-rows by t»-columns)'by forming a aoss-classification of tlie areas. If the two 
classifications are independent, then the expected cell frequencies area product^f 
the respectWe row and column probabilities, this statement has a theoretical basis ih 
the concept jof a joint probability function. These expected frequericies ar^^ 
computed and compared to the observed frequencies in a par^cular way, and 
summed to-produce a value of chi-square. This computed value is compared to the 
tabulated chi-square, with (r l) X(c-l) degrees of freedom, enabling one to accept 
or reject Ho- < 

This method is employed often in questionnaire analysis (see Baumann, 1969.) 
A related statistic, C, the contingency coefficierff Which varies between O^and l,*is 
often computed at the same time (Carey, Macomber, and Greenberg, 1968; * 
Friendly, 1965),* as well,a& other measures of association such as lambda, gamma, 
O 
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etc. Problems of interpretation arise when the^ number ofxells with expected 
frequencies ot less than 5 is large, and wjien the contingency table is larger than 
2X2. The first difficulty is usually resolved by reclassifying the data, while the 
second can be eased appreciably because of the additivity property of chi-square. 
Maxwell (1961), cited in Section 4B^ shows how the total chi-square value can be 
partitioned as a series of 2X2 tables as a result of this property. Ray (1965) 
illustrates this method. , ^ ^ r,^ 

The general utility of the chi-square distribution in regional^stu'dies has been 
discussed by Mackay (1958) and Zobler (1957). A novel use of the revealed 
probability levels associated with different values of chi-square is presented by 
Kellman and Adanis (1970) in their constellation diagrams of the linkages between 
categorized variables. 
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SECTION 17 
ANALYSIS OF VARIANCE 



The analysis of variance, in its simplest form, essentially extends the t-test of 
significant differences in means for two regions to the /:-region case. 'The null 

hypothesis is then written as Hq: /i, = iU2 = =Mk» and the test statistic used is 

based on the F-distribution.^The analysis introduces an important concept m 
_ statistical inference-the partition of total variance into, component parts. In the 
single-factor case, the tWo components are within.-group (or region) and between- 
group variance. Note that there is then assumed to be no interaction between the 
groups. ' > 

The method assumes that the samples, are drawn from normally distributed 
populatioYis, and that within-group variance does not differ significantly from 
region to region. If the first assumption is not met, transformations can be applied 
to the sample values (Krumbein and Miller, 1954). The second assumption 
(homosced'asticity) is treated, by Bartiett's test (Section 16); if it is fulfilled, the 
only way that a significant variation can exist from jregion to region is if the group 
means lie at different elev^ions. An example of a single-factor model is presented 
*by' Knos (1962) In his study of urban sectoral variations in land values. The validity 
of regional divisions can alsQ be examined in this manner (Zobler, 1958; Laut, 
1967; see also Section 33), as well as hierarchical classifications of ^ettlemejit as 
expected in centrar place analysis (Mayfield, 1967). The technique also has 
considerable utility in examining sources of variatiqn (Davis, 1971). 

In addition, the analysis of variance can be extended, to include other factors. 
For example, Murdie (1969) constructed a 6-sector X 6i2one grid for metropolitan 
Toronto in order to examine concentric and zonal models of spatial variation in 
urban social structure. The sector X zones (1st order) interaction effect COuld then 
be interpreted as indicative of the notion of nucleations within the overall patterns. 
This type of two-factor model has also beert used by Timms (1971) and discussed 
by Johnston (1970). * [ 
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It is clear that tHe number of factors could be increased, but diffic\ilties in 
interpreting lst-6rder, 2nd-order (etc.) interactions also increase. The value of these 
more complex models is great, however, as long as a firm grasp on th6, theoretical 
implications is maintained. Boyce*s (1965) analysis of urban travel patterns is 
exemplary in this respect. A comprehensive review of different models is presented ^ 
by Krumbein and Miller (1954), and the obvious relevance of analysis for 
questions of experimental design is discussed by Krumbein (1953). Operator 
variance can also be evaluated in this context (Hill, 1 968). * 
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SECTION 18 
NON-PARAMETRIC STATISTICS 

• When measurement is carried out using nominal or ordinal scales, most 
parametric tests canaot be applied, so that a number of tests have been devised for 
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th& situation. Since no assumptions are tfiade about the nature of the population . 
distribution from which the sample was dr^wn, these methods" are also called 
.^distribution-free/' Non-parametric methods are particularly useful w<ien sam|)le 
size is small (Haggett, 1961); in fact, regular statistical methods cannot be used with 
small samples unless the population distribution is Known exactly. If all the 
assumptions of parametric methods are fulfilled and if the measurement is of the 
required level, the non-parametric tests are wasteful of data. The degree of this 
difference is e)f pressed by the power efficiency of the non-parametric ^t, and tHe 
distinctions can be virtually eliminated by increasing sample size. 

There are many tests in this category: Siegel, 1956 (cited in Section 4B) covers 
most methods. Useful reviews of the techniques are presented by Keeping (1967) 
and French (1971). Most parametric tests have non-parametric equivalents; for 
example, in toting for differences between measures of central tendency in two^ 
regions, the non*parametric test involves the median. Analysis of variance for the 
A:*region case (the Kruskall-Wallis test) is illustrated by Fenwick (1965). Me^ures of 
association Include contingency analysis using the chi-square statistic (Sectlin IQ 
for nominal data, while correlation coefficients (Section 21) can bexomputedVjS^ 
ranked data. For example, there are two coefficients of simple correlation: 
Speaniuin's rho (Bucklin, 1966; Stemstein, 1962; Wood, 1967) and Kentall's tau 
(Cox, 1969; Golledg6, Rushton, and Clark, 1966). ^ coefficient of multiple 
correlatit>1n is also available : Kendall's coefficient of concornance (cS). 

The Kolmogorov-Smirnov test has been empldyed often in distribution^fitting 
(Section 15) when the sanfple lize is small (Dacey, 1968). An associated statistic, 
Droax» is also very useftilfor those cases in (contingency analysis where chi-square ' 
^not be used because the number of cells with expected frequencies of less than 5 
;e, and where reclassification would remove much detail in the classification, 
n all K-S tests, tl>e cumulative proportions are used, and the maximum 
difference in the two sets (D^ax) ^ known to be distributed as chi-s()uare. 
» Johnston ( 1 966) illustrates these procedures. 
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• The term geostatistics is used to refer to two main channels of research -into 
methods of applying linear statistics to areal distributions. The first approach, 
centrography, has the longest tradition, center:s of gravity of ^ populations of the 
USA and the USSR* attracted much interest in the first two decades of this centufy 
(Sviatlovsky and Eells, 1937). Bachi*s work (1957^.1963) revitalized the subject and . , 
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added importanl measures of dispersion (tiie* standard distance, a bi-variate 
eqdvalent to the standard deviation) to the existing measures of central tendency. 
In the latter case, there was some confusion over the arithmetic-mean center and 
the median center (Hart, 1954), which was settled m the Rebate over the **point of 
minimum aggregate travel" (Porter, 1963, l964;Court, 1964). 

The secQnd tradition is embedded in attempt^; to utilize piiysical laws in the 
study of social phenomena, evidenced in gravity models and ui socialphysics. T^he 
latter research is associated with the work of Stewart and Warntz (I9S8, 1959) and 
Warntz's subsequent development of nriacrogeography (1965, 1967). The main goal 
of macrogeographic analysis is to examine the role of distance in understanding » 
regularities in aggregates of socio-economic data, and to this end potelitial models 
are employed. Neft (1.966) has summarized much of this work and integrated it 
with the measures derived in centpgraphy .•Besides descriptive statistics, Neft has 
also discussed measures of areal association^ * . ^ 

'Gedistatistical methods are not reported frequently in the literature, although 
Wolpert (1967) has emphasized the value of the median center and model center of « 
migration fields, in^ developing appropriatQ parameters for the analysis of spatial 
flow phenomena. Finally, the solution of the Weberian "location-triangle" problem 
in economic geography utilizes similar^approa'cjies to those employed in geo- 
^atistics; see, for example, thQ contributions by Kuhn and Kuenne (1962) and 
Cooper (1967). 



BACHI,Jr. "Statistical Analysis of Geographical SeHes," Builetin de Vlnstitut 
4ntematiomi de Statistique, Vol. 36, 1957, pp. 229-240» 

BACHI, R. "Standard Distance Measures and Related Methods for Spatial 
Analysis," Papers of the^Regionai Science Association, Vol. 10, 1963, pp. 
83-132. 

CAPRIO, R. J. "Centrography and Geostatistics," The Professional Geographer, 

Vol.22, 1970, pp. 15-19. y ' * ' . ' < 

COOPER, L. "Solutions of Generalized Locational Equilibrium Models," Journal of 
Regional Science, Vol. 7, 196:7,Vp. 1t-I8. 
. iilOURT, A. "The Elusive Point of ^Minimum Travel," Annals, AAfe, Vol. 54, 1964, 
^ . PP.400--403. * j 

HART, .J.^: "Central Tendency jn ^Geographical Distribution^' Economic 

Ged^phy,Wol30, 1954, pp. 48-59. 
KUHN, H. W. and R. E. KUEIWE. "An Efficient Algorithm for the Numerical 
Solution of the Generalized Weber Problem in Spatial Ecouumics," Journal of 
.Regional Science, Wo\A,\962^,pp^2\-23. ^ , 

MURPHY, R* E. and H. E. sHtoAL. "Movements of the Center of Coal Mining-in 
> ' - the Appalachian Plateausf' Geographical Review, Vol 35, 1945, pp. 

. 624-633. \ ' » 

/ NEFT, D. S. Statistical Analysts for Areal Distributions. (Monograph Series No. fi). 
Philadelphia: RegionaiScieftce Research Institute, 1966, 173 .pp. 

O <s 47 . • 



PORTER, JP. W. "Wliat is the Point of Minimum Aggregate Travel?" i4ww/s, AAG, 

' V^i*Svi963, pp. 224-232.- ^ 
PORTER, P. W.^*'A Comment on the Elusive Point of Minimum Travel," Anmls, 

AAGrVol. 54, 1964, pp.403-40d<^* / - 

PRUNTV, M. "Recent Quantitative Changes in tlie Cotfon Regions of the^ 

Southeastern States," Economic Geography, Vol. 27, 1951 , pp. 189-208. 
SHACHAR, A. "Some Applications of Geo-Statistical Method^ in Urban Research," 

Papers of the Regional Science Association, Vol. i8, 1967. pp. 197-206. 
• STEWART, Q. and W. WARNTZ. "Physics of Population Distribution,"/ourRa/ 
^' . o/^c^iw/5dencefVol.l,195§, pp. 99-123. 

STEWART, J. ^Q. and V. WARNTZ. "Some Parameters of the Geographital 
Distribution of Population," Geographical Review, Vol. 49, 1959, f)p^ 
, '^270-272. 

SVIATLOVSKY., E. E. and W. C. ?ELtS./The Centiogi-aphical Method and 

Regional Analysis," GeographicUi Review, Vol. 27, 1937, pp. 240-254. 
.WARNTZf .W. Macrogeography and Income Fronts. (Monograph Series No. 3). 
Philadelphia: Regioi^ Science Research Institute, 1965. 
. WARNTZ, W. "Macroscopic Analysis and Some Patterns of the Gec^apliical 
Distribution of Population in the United States, 1790-1950," in Garrison, 
W. L. and D^ F. Marble (pds.). Quantitative Geography. Part I: Economic and 
Cultural Topics, (Studio in Geography No. 13). Eyanstpn 111.: Northwestern 
University, 1967, pp. 191-218: 
W/^RNTZ, W. and D. NEFT. "Contributions to a Statistical Methodologx for Areal 
> ^ Distributions,"7oMrrw/ ofRegioml Science, Vol. 2, 1960, pp. 4]-66: ' 
WOLPERT, J; "Distance and Directional Bias in Inter-Url)an Migratory Streams," 
Annals, AAG, Vol.*57, 1967, pp. 605-6J6. 



REFERENCE: Section 4A. King (9'2-97). 



POINT PATTERN ANALYSIS , ^ 

The spatial distribution of points in a study area is analyzed by a set of 
techniques that were originally developed in plant ecology (Greig-Smith, 1964). 
The generar forrnat of these methods* is that the observed set is compared to the 
theoretical set of points that, would, be generated by one of a number of probability 
processes. For example, the 'Poisson distribution can be used to generate an 
expected point set that is randomly distributed, while the negative binomial 
. function is thought to generate a clustered set. The technical details of applying 
these functions can be appreciated by reference to Harvey (1967). A useful listing 
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of most of these probability models, with comments on parameter estimation and 
the availability of published tables, is given by McConnell (1966). < 

^The primary qu^tion in this research area is: to what extent can the observed 
set of points be described as regular, randoin, or clustered? There are two 
approaches to answering this question: 1) quadrat counts and 2) point-to-point 
distance measurements.^ In the first case, a grid is laid over the study area, and the 

frequency, of cells that contain 0, 1, 2, points is tabulated (Getis, 1964; 

Harvey, 1966). This observed frequency distribution can then be compared to a 
theoretical one,* derived perhaps from.|the Poi&son process, in which case 
randonmcss would be teste4/Undercccimin assumptions.\conceming the" inde- 
pendence of point allocation to cells,V€m-square test is applicable. We may note 
two pro|)leips in, this method: a) thfe parameters used in tHe probability functions*" 
generally involve some estimate of the density of points per unit area; and b) |he 
quadrat itself can be of varying size, so that diffe^nt conclusions may be reached 
about ^the randomness of the distribution. In the first case, it is clear that an 
adequate specification of the study area boundary is necessary (Hsu and 
Tiedemann, 1968), although recent work in patt^ recognition (Hudson, 1969) > 
may identify sub-areas within a lar^jegion in whi^h thf measures can be applied. 

Distance measures have been developed from n^rest>neighbpr statistics used in 
ecology ( Clark and Evans, 1954) 'smd generalized^ order-neighbor distances by 
Dacey ^^e Dacey and Tung, 1962, who also describe the regional-nei^bor 
approach). The method involves computation of the mean distance and associated 
variances for. each order, and comparison of these to expected distances. For 
example, under the Gumption* that the first-order (fetahces are^drawn from a 
normal population, a density dependent expected ^ean can be derived. Random- 
ness can then be tested using the standard normal curve. Altematively, a ratio of 
observed and expected mean distances can be computed*(/?: the nearest neighbor* 
measure). The range of R is from 0, indicating complete clustering, to 2.14 for a 
""4iexagonal, most regular, pattern. If R= 1, ttye distribution is said to be random. 
This measure is illustraterfin the studies by King (19^2) and Barr et al (1971). 

Only a sample of ttie literature is given in>t}ie luting; see King's* survey of the 
field. AWiough settlement patterns have been the jnfiam- focus of empirical studies, it 
is clear that any distribution that can be repj^senfed as a set of poipts^can be * 
e^luated using these methods; see, for example,. the pse for drumlins, Smalley and 
Unwin (1968) 'and Trenhaile (1971). The imp^i|:it|hope that analysis of poirft 
paticms would iead to greater understandii^ of processes generating the patterns 
has not been fulfilled. Indeed, Harvey's work (1966j indicates that any attempt to. 
understand process fit>m form analysis, is likely tci be^self-defeatSlg. In this sense, 
then, the. great amount of research effort expehdedin Uie analysis'of point patterns 
has not had any positive substantive. feedback for the discipline. On thef other hand, 
Dacey's contribution to spatial stochastic process modeling is exemplary in pointing 
up the .evolution of geography toward a theoreticjal discipline. 
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SECTION 21 
CORRELATION 



In correlation studies' the emphasis is on the interdependence between two or 
more variables, with no implications of functional relationships. In terms of the 
matrix organization of geographic data, the researcher would be concerned with 
companng two or more columns-examining the covariations of two oynore spatial 
distributions. It is worth noting that this measure can be dcriverf directly from 
probability theory, covariance has a specific inierpretation in this respect. If two* 
random variable?, say X and Y, are independent, then the variance of th^ir sum is 
equal to the sjim of the respective variances^ Var(X + Y) = Var(X) + Var(Y). This 
expression is modified by the addition of the covariance term (Cov(X,Y)) for the 
case of dependent random variables. Note that the covariance of a variable with 
Itself IS the vanance. Cov(X^) = c^. The population correlation (p = rho) between 
X and Y is defined as p = Cov(Zx7y). where Zx and Zy are the standardized 
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* variables ^or X aiK} ^Y: eg. 7^ = (Xj ^Mx) / a^. Alternatively, one can Write 
p = Cov(X,Y)/a,ay. 

The sample correUUon coefficient (r) expresses the *d€^ of association 
between two variables, and it has-a range- from (a perfect negative or, inverse 
relation) to +1 (perfect positive or direct association). The coefficient, then, 
expresses both magnitude and direction of association, dn the assumption, of a' 
randopi sample f^l^u'I' bivariate normal populatiwi distribution, the research 
h^K^thesis of a iignificant association may be tested (usuaUy, Ho . p = 0) by means 
. of a Mest. It is also possible to tjst for significant differences between the 
correlation coefficients of two regions (Ho : Pi = p^) by m^ of a transformauon 
and reference to the. standard normal distribution; see Thomas, 1962 (dted in 
Section 27), who uses the test for different time periods, The^* square of- the. 
OTidation coefficient, ri, is known as the coeffide^it of determination, and it 
measures in a general sense the proportion of total variance that the two variables ^ 
have m common. It is more correctly derived as a particulaf ratio of components of 
variaiice in regression 'problems (Section 27), bat in, empirical studies it is usually' 
rcpOTtcd as a percentage, or as the lev^ of exftoned variation. 

Thus far only the bivariate situation has been described. With more than two 
variables, two alternative approaches are posable. Pint, the researcher may wish to 
compute an possible p>ir-wise correlations from a set of m attributes. In this way, a 
new matrix of intercmrelations (order m X m) can be derived fcr further analyris, 
as by means of factor analysb (Section 24); 'see, for example, Beny (1963) and 
Nonnan (1969). Secrad, a muMi^ correlation analysis may be carried out. In this 
case, the coefficient of multiple correlation {R) indicates the degre^ to which two 
or more variables Relate to 5 third. For example, Ry x z is the multiple correlation 
coefficient of X 'and if wit^i Y. R}: is defined as the coeffident of multiple 
determination^ ; ' ^ j . 

• " In multiple corre^ion studies it is also instructive to measure the associaBfc ^ 
between any two variables, j«th the influence of othe? variables ^leld constant ^ 
sutistically. This is achieved by the partial correlation cocffiderits, which are • » 
usually denoted as foflows: if ihra: variables are involved (eg. Ry x zX ti!e1^&/re 
are three partial correlation <^J^tt^'. ry,.;,. rj., and r^,.y. The fir5t,Tndicating I 
the association between X Sd V wfth the 'Influence of Z held constant, may be 

; different even in sign from ry,, the simple correlation coefficient. Note that partial 
conplatic^ coefficients have, been* us«i m two major contexts-in causal analysis 

. (Section 23) and in step-wise 'jegres^ procedures (Aangeenbmg, 1968; see also 
Action 27). A standard text on coYrej^ti^^.^d regression methods is Eiekiel and' ' 
Fox, 1959 (dted in Section 4B). \ ' ' 

In geographic ipplicaliOTS it seems thai-most studies do not necessarily fulfill 
the assumptions for the testing of *hiT>otheses.^lnstead'"<Jf using a comp'fete • 
inferential framework, then, correlation studies appear to be mosdy descriptive; for ' 
example, in comparing different measures of manufacturing (Alexander and 
Undberg, 1961;.Xeigh, 1969; Wong, 1968). Also, it is useful to examine 
correlations to further understanding of any regression analysis that is subsequently 



earned out (Ambrose, W70). A more explicit representatiorr of this approach is 
seeif in the dimming of correlation bonds (Melton? 1958). Smith (1965)^ 
illustrates an interesting application of correlation analysis. The matrix^ is 
transposed (reversal of rows and columns) and correlations computed between areal 
umts (towns m the study) m order to classify them by means of cluster analysis 
(Scctiop 32). 

Difficulties m apply mg this technique in geographic research stem fronl the 
different sees of areal umis, as might be expected from previous statements made 
about the mfluence of the aggregation efTect. One sol uuon- weighting the 
calculauons by the areas mvolved-was proposed by Robinson (1956). Thomas and 
Anderson (1965) present an mferential framework on more ^general grounds than 
those used by Robinson, but there are stiD a number of assumptions involved that 
nihibit a general solution. More importantly, the shape aIi4s.^entation of 
two-dmaensional coU^uon units cannot be handled by such methods Curry (1966) 
raised this quesuon and generated much research on the way in ^iliich administra^ 
tivc units effectively filter out characteristic frequencies in the data. Spjtial filtering 
methods attack this problem directly (MacDougaD, 1970), and references in 
Sections 22, 29* 30, and 3 1 are all pertinent to the problem. 
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SECTION 22 
ECOLOGICAL CORRELATION 



V . Data ^ri individuals, aggregated into a set of basic areal units such as counties, 
form the basic input for many geographic studies. Variations between areal units, 
analyzed in a number of ways, can then be used to make inferences about area 
structure or the structure of characteristics of areas (Beshers, 1960). This woufd 
appear'to be a primary level of description (no matter how advanced the techniques 
employed) concerning spatial distribution^ or arrangements, and it naturally does 
not allow the researcher to make any statements about processes thought to lead to 
the Disserved ^patterns. Process, in most studies, is interpreted as the outcomes of 
sets of decisions^ made by individuals or ^oups, with differentiated locational 
impacts on an existing spatial distribution. ^ 

The question that arises at this juncture is: cai? inferences about individual-level 
behavior (= process) be made fronj observed variation in-^jrms.of unit areas? This is 
the problem of ecological correlation. The recent volume edited by Dogah and 
Rokkan (1969) is parti,cularly recommended for a series of articles on aspects of the 
prpblem (Allcer,.1^69; Valkpnen, 1969). Certain assumptions are necessary to allow ^ 
• the transfer of inferences across the scale difference, but Valkonen illustrates how 
this appears to be especially a function of areal unit size-the larger the area the - 
higher the ecological variation and the higher the ecological correlation compared 
to individual correlation. Clearly,, th^ effect^of different-sized units is methodo- 
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fogically related to the problem of modifiable units (Section 21) and to filtering 
techniques used in trend analysis (Tobler, 1969, cited in Sei$tion^30) and in space 
series analysis (CasetU, 1966, cited in Section 31). Substantive interpretations, 
however, differ considerably-at least as reported in the literature, A careful sorting 
out of theoretical and methodological elements is demanded in reading this, 
literature. ^ . 

In ordei to test adequately the applicability of this idea, a strategy is needed 
whereby individual data, referenced by locational coordinates, is available (a rare 
situation) and different aggregation factors can be applied to it. Goheen, 197Q(cited 
in^Section 26) has presented a series of tables of factor structures (Section 24) for 
the city of Toronto over four time periods (pp. 139, 158, 174, 207), based on 
individual and areal aggregations of the same basic information. At the Small scale 
used in his study, Goheen concluded that there was little difference in the results, 
' i.e. ecological and individual correlations were relatively (iongruent. 

« 
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SECTION 23 



CAUSAL ANALYSIS / 

In empirical studies, the control of independent variables possible in laboratory 
situations (holding other factors constant) is not feasible, so that a particular 
methodology to establish fcausal connections between factors has been derived, 
largely due to the w^rk of Blalock, 1964 (cited in Section 4B). The technique is 

56 



ERLC 



based upon the partial correlation coefficients; for example,, when a causal link, 
between factors X and Y is expected and r^y ,z = 0, the product r^z-ry ^ can be used 
to predict r^y as an expected simpfe correlation. This expected correlation pan then 
be compared to the computed coefficient, providing an indication of the adequacy 
of the causal model. The lack of any causal connection between factors is expressed 
by fxy ^ 0» this can be compared directly to the observed correlation. 

Cox (1968) has illustrated this technique in a study of suburban voting behavior, 
in which the inadequacies of an initial modef ate treated by the^ develbpment of a 
better causal situation with different linkages. Criticisms of this article point out 
. two major difficulties in applying the method in geographic research: 1) Taylbr 
4 (1969) indicates that the researcher must apply extreme caution in the statement of^ 
televant causal linkages; and 2) Kasperson {1969) shows that the transition frpm 
theoretical background to empirical operationalization has to be carefully 
evaluated. 

, CAPECCHI, V. and G. GALLL "Determinants of Voting Behavior in Italy: A 
Linear Causal Model oPAnaly^," in Dogan^ M. 'and S. Rokkan (eds.),* 
Quantitative Ecological Analysis in the Social Sciences, Cambridge, Mass.: . 
M.LT. Press, 1969, pp. 235-283. , 
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>lnnfl&,AAG,V6l. 58, 1968, pp. 111-127. ^ • ; ~ , . ' 

COX^K. R. "Comments in Reply to Kasperson and Tayl»,^*j>lnnflis, AAG, Vol. 59, 
. 1969, pp. 41 1^41 5. . ' J 

KASPERSON, R. E. "On Suburbia and Voting Beha^jor," Annals, AAG, Vol. 59, 
1969, pp. 405-41 n . * ' t . ' >, 

TAYLOR, P. J. "Causal Models in Geographic Research," Annals, AAG, Vol. 59, 
1969,.pp. 402-404. , 

SECTION 24 ^ 

FACTOR ANALYSIS AND PRINCIPAL COMPONENTS 
ANALYSIS: ^N OVERVIEW 

- A variety of methods ^an be d^scril^ed under the general heading of factor 
analysis. They $xq all con(ierrte4 with th'd* relationships that exist between sets^pf 
variates;^in other \yords, interdepen^encies among a set of attributes are She major 
concern.*This has already J?een' noted in the case of correlation studies, wherp in 
underlying concept was the^ extent to which two or more variables shared a 
common amount of the total variance between" them. Factor analytic techniques 
seek^a further answer to questions raiseii in' thk context: given a large dumber of 

^variables^is it possible to describe composite ^variables, fewer in numb^er, perhaps 



uncorrelated with each pther, that summarize the known degree of redundancy that 
^ exists in the larger set? Factor analysis does provide an adequate answer to 'this 
question ^and is*^one of a number of techniques usually grouped together in 
. multivariate analysis. The sampling assumptions are rather more restrictive for these 
analyses, and geographic research has tended to shy away from any inferential 
questions in*this context. Thus factor anal^ic studies tendj^be largely descriptive 
of regional structure. 

Two forms of analysis dominate the field: principal components (Section 25) 
and factor analysis (Section ;26). .The differences between the two approaches have^ 
been adequately stated by King in his tex.t,but they are rarely made explicit in the 
literature. Indeed, the particular modpl us6d is not always specified, and on 
occasion the possible options for factor analysis (e.g. rotation) are applied 
incorrectly to the principal components model. In the latter case, it seems clear, that 
the researcher works from data towards the specificsUion of a theoretical structure 
for; the domain he is studying. The analysis itsen is simply a mathematical 
orthogonalization (a process of making new variables independent of each other) of . 
the existing set of attril^utes, and in the execution of the method tfie researcher 
may be lucky to give some empurical meaning to the derived mathematical artifacts 
or compone^nts. In contrast, a theoretical model should be tested by factor analysis: 
does the model agree with thet data? If so, estimates of the parameters can be made. 
^ ^ Even with these strict differences between the two models, it is clear that some 
prior evaluation of the likely outcome of applying any technique should be made. 
In this casfj, relevant questions are: how many factors can be expected; what types 
of factors,-seen as combinations of the original variables; how should the original 
attributes be defm^ in order to answer the research questions posed; what sort of 
relationships canjp expected among the new factors; and, what are the most 
meaningful communaUty estimates? It is particularly the last "^two questions that 
serve to diifferentiate the twi/models. In principal comppnents analysis, only 
orthogonal components are produced antf there would seem to be little point in 
requmng non-orthogonal solutions. As we' shall see, this is not the case in factor 
analysis, whe^re oblique solutions ;nay be more relevant, theoretically. Communality 
estimates refer to the extent that the particular factor solution accounts for the 
variance of any original variate. Operationally," these values are placed in the 
principal diagonal of the intercorrelaiion matrix, which is the stage at, which, 
analysis usually commence?^ In the principal components melho4, 1.0 is the 
element value. In other words, the variance is emphasized. (Since the. attributes are 
usually standardized- before the intercorrelations are computed, they have zero 
mean and unit varianqe.) Fadtor analytic models have* a different orientaticMi, and 
communalfty estimates used in geographic research oftenf use /?^, th^ coefficient of 
- multiple determinatiorf of the^variable in question with all others in ikg set. Thus,' 
thfese latter models stress the oovariance aspects* of the problem. * • \ 
4> The steps in the .analysis may be outlined in brief for the principal componei?ts 
solution, and then the differences for the factor analytic model can be stated.' As^ 
we have already indicate4,^the attribute matrix (order n places by m attributes) is 




usually standardized, and the matrix of zero-order correlations is computed 
(/hXm). Analysis of the latter maftrix proceeds by deriving the associated 
characteristic equation (a polynomial of the same order afe the matrix) whose roots 
define the latent values or eigenvalues. For each eigenvalue, an eigenvector can be 
computed and when standardized the matrix {mXp) de^ribing factor" structure- 
can be derived (the standardized eigenvector associated with eigenvalue k is enterep 
.into the matri^t as column k). In the case of principal components, p = m in the 
/complete solution. More commonly, however,/? is taken as some number of factors 
' less than the total number, and criteria for doing this are inexact. Included are* 
eitlter those factors with eigenvalues greater than 1.0 gr those that account for 
.^ore than 5% of the total variance (measured as the proportion of anyeigenvalue 
to^lhMotal variance, which in this case equals the sum of ^he eigenvalues). The 
elements ol^the cojumns of JhM^ctor matrix, often called factor loadings, can be ^ 
used. for making a geographic interpretation of the component. This final stage *of 
the analysis is usuall>^i^d considerably by computing the component scores-the 
location of each oC.the individual unit areas on the new variables,. computed as a 
direct linear function. When these scores are mapped, it is usually possible to give 
tome empirical interpretation to the component. Finally; we should note that the 
factors are extracted sequentially so that the* first one accounts for as much of the ^ 
variance's possible; the second then accounts for as much of the'residual variance 
as possible andiin addition is orthogonal to the first; and so on, for^emaining 
factors. , 

Factor analytic models differ from the principal components solution in several^ 
respects. The statement of the model itself implies a radically different pThilosopll 
Fattor analysis test^ a mbdel that is based on the idea of parti tioningajie ioi\ 
variation into component parts: the commoi) factor variance (the variance^4|f " ^ 
Factor 1 + variance of Factor II + . . . etc.), the error Variance (usually asso6iateu 
with measurement), and non-error variance specific to variables, excluded from the 
fdctor structure, or unique variance. In terms of an understanding of theoretical 
factors underlying, some set of attributes, this would appear to be a realistic 
approach. This partition of the total variance also explains why the question of 
communality estimates is-so iihportant in /actox analysis. It also explains why 
factor scores are. Only estimates of scale scores, compared to component seores. . 
'Finally, the idea of rotation of the factors is feasible and is theuielically of value in 
factor analysis. Roiation is carried out by changing the (actor loadings so that some 
cntenon, such as simple structure, is achieved j?r a best fit from a theoi'etical point 
of view is obtained. Note thaf the final communalities are not affected by rotation, 
although isigenvalues are. . 
, The references include several articles on the utility' o'f factor analytirmodels, 
especially with reference to th^ social geography of the city (fact^)rial ecology); see, 
especially. Berry, 1971 , Recs, 1971 ; Janson, 1969. A general review of 'multivariate 
analysisMS presented By /Thompson (1970), while applications of thisap(5roach are 
given for .physical geojlaphy-by McCammpn (1966) and Mather and Doornkamp 
(1970). A general prO^m is that factor analysis assumes that relationships between \ 
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variables are linear, and thus their effects aro additive. Many correlation and 
regression* studies, particularly of residential structure of urban areas, have shown 
that non-linear relationships are more likely. The relative value of using factor 
analysis against traditional methods sucb as correlation can then be addressed 
(Meyer, 1971). Evidently, many^ of tfie conclusipns in such an inquiry will relate to' 
the level of generality that the researcher wishes to achieve. 

In Settion 4B, the foflowing texts are useful for factor analysis: Harman (I960); 
\ Lawley and Maxwell (1963); Horst (1965); and the two articles by Cattell (1965), 
particularly for'Jhe relative advantages of oblique and orthogonal solutions. Matrix 
algebra is treated in Fuller (1962), Hohn (1964), and Horst (1963), also cited in 
that section. ^ 
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\ ' SECTION 2^ 

PRINCIPAL COMPONENTS ANALYSIS ^ 

Principal components analyses of th^ttribute matrix, defined as in Section 24, 
have been carried out for a number of purposes. An emphasis on structure, by , 
examining tffe loadings matri)^ and associated mapped patterns, of ^component 
scores, is illustrated in the studies ^Ivy BlaiKe (1971), Care)^ (J1966) ^d Rohson, 
(1969). Another common procedure i^ to utilize the componerit scores^as input to ^ 
classification or regionalizatito schenves (Section ^2). Mosef and Scott (1961)and^* 
Yamaguchi (1969) present typologies of urban centers, while Ahmad (1965) and 
"King (1966) are concerned alsp with^the regional aspects of urban system- 
classification. The value of the approach for the study of economic regions is 
clearly stated by Brpwn and Trott (1968). It is also possible to re^^inalyze the scores ^ 
obtained from an analysis, looking for higher order component solutions^ This is 
spmetimes accomplished by rotating the axes to anv oblique solution, and ihen 
forming an orthogonal solution /or the correlated scores. A^n alternative method is" 
to analyze only a sub-set of the original ^umber 6f unit ar&s.*The st'udy by Jones 
and Jones (1970) represents this approach, since they perfoqned a principal; 
componen^^ analysis on a sub-set of the towns used by Hadden and Borgatta, 1965; 
(ciied in Section 26). ^ , c 

The techniques as described thiis far concentrate onvforming composite variables ^ 
from the matrix of inter-corrclations; i.e. the method essentially , analyzes 
, difference between the columns of the matrix.^he general name for this approach^ 
is R-mode analysis. It is possible, however, to carry out the analysis for the rows of 
' the matri)^, in which case a Q-mode analysis applies. The result^s "pomposites of 
unit areas,** or rough indications cjf types of area, such $s farm types (Henshall and 
King, 1966). It is alsp, possible to use other measureipent scales in^factor analytic 
studies. Xhe presenc^ of absence (i.e. ir6minal scaling) of attributes^code^ as a 
series of 1-0 measures, can be analyz^ directly (see Berry, Barnum, and T^nnant, 
, cited in Section 26) or through an inter-coryelatipn matrix for such binary 
leased on the phi-coefficient (Henshall, 1966). A sirriilar^jhi-matrix was used 
by Garrison and M^le (1^64) in their sludy^of grouping tendei^Hr*-; an^ng 
transpc^tation nod^sf^ince it wa^ derived from the 1-0 connection ni&trix. 
Interaction mat"rr»c with elements representing actual flo\ys between placets can 
also be analyzed (Simmons, 1970) to produce common sets oT origins and 
destinations. . • ' » 

One under-utilized value of principal components analysis is the veryjact that 
the resulting sqores for the set of areas are independent of each other. As will be 
seen from the discussion of independent variabjes iji regression analysis in Section 
27, th6 use of component scores ^would alleviate many problems of^multi- 
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colline^ty in general linear fnodels. This approach is illustrated in the studies by 
Wongftl963) and Riddell (1970). ^ 
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SECTION 26 



FACf OR AP^ALYTIC STUDIES 



It is clear that many similar studies to those reported in the preceding section are 
cited here. For example, the use of factor scores as basic input for regionaljzation is. 
described by Berry (1965) and Hadden and Borgatta (1965). Indeed,' most of the 
studies are descriptive, rather than fulfilling the role that factor analytic models are 
presumed to have in the testing of theory. However, a fairly wide range of problem 
areas is represented, so that the general features of the analysis can be related to the 
interests that any student might 1iave. The use of factor scores in general linear, 
models is represented by the studies by H^tshom (1971) and Lowry (1970). While 
these studies employ the factors as independent variables, it is obvious that they 
could be treated as dependent variables. One particularly interesting, form of the 
linear model (trend-surface analysis-see Section 30) has been employed to 
partition the total variations of the factor over space into regional and local 
components (Goheen, 1970). Interaction matrices have also been analyzed using 
factor analysis, and if the results are^then used in regional ization schemes a set of 
functional regions i$ obtained (Goddard, 1970; Illeris and Pedersen, 1968). 

One exception to the generally descriptive use of factor analysis in geographic 
research is seen in th^ study^ by Jeffrey, Casettj, and King (1969). Individual 
time-series (Section 31) were, available for each city in a Sample of midwestern 
metropolitan areas. The research hypothesis was^that the set of series contained 
three levels^^f variation: 1) factors operating throughout the system; 2) factors, 
common to cities in predetermined groups (on the basis of similarit^ies in cyclical 
behavior); and 3) a factor unique to each city. Bi-factor theory is applicable in such 
a pontext, and for the unemployment series used in the study the hypcTthesis Was 
substantiated. It is apparent that studies of this nature are important in testing 
^ evolving geographic theory, which must incorporate multivariate components of. 
spatial problems. This study underlines the fact that analytical models of the types 
described in this and the prior section could be extremely valuable in such contex^^ 
since the concept of parsimony in empirical research situations is incofporated mto 
techniques used for testing theory. * ^ 
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•SECTION 27 
REGRESSION ' 



In contrast to correlation studies, functional relationships form the focus of 
interest in regression analy^sis. The most general statement of such a relationship is 
Y = f(X), the variable Y is' said to be dependent on variations in X. Y is known as 
the dependent or response variable, while X is an independent (or control, 
predictor) variable. The ampoftance of a strong theoretical framework for posing 
regressions-questions must^be reahzed at the outset, for example, what is the "form of 
the« relationship, what are the natural process mechanisms at work, what are 
possible increments of Y given unit increases in X, what is the value of Y when 
X=0? The first question rais'es the distinction between linear and non-linear 



models. * 



The: simple (i.e. bivariate) linear model is usually written Y = a + /3X + €. a and P 
are constants of regression, a is the value of Y when X = 0, the Y-intercept. 0 is the * 
regression coefficient, indicating the slope of the line, a direct relationship being 
indexed b^ a posuive value for /3, while a negative value shows an inverse 
relationship. The error terms (e) represent that part of the variabili^ty in Y not 
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accounted for by thelinear relationship. U is assuni^d that these deviations are 
independent and normally distributed, ^vlth zero mean and unknown variance/ In 
geographic research the mapped pattern of such residuak from regression may 
indicate that ^he expected random pattern is not foutJd, so that this assumption in 
the* model is not fulfilled Non linear relationships, are often expressed in a linear 
form by means of transforming the original values for efit^ier X or Y or both, usually 
by logarithms For example, an exponential relationship (semi-loganthmic) can be ^ 
written as log Yr=log a + log b X + e» while a pow^r-(or logarithmic) function 
(Y = aX^) can, be exprgssed as log Y = log a + b log X + e. If these simple 
transformations do not suffice, than a polynomial regression may be carried out 
(SecUon 30)/ ' ^ 

Whatever the chosen form of relationship, the paired ob^rvations for each areal 
unit can represented in a graphical format as a scatter 'diagram, usually with Y 
plotted on the ordinate and X on the abscissa. As indicated above, the association 
betweep X and Y is written as a mathematical expression, so that in the elemental 
form this appears'jo be a case of fitting a curve to the set of points. At such a stage, 
then, there are no ipferfential questions involved, but in choosing one of the infinite 
number of possible lines that could be drawn through the set the concept of an 
averaged relationship U employed*, ^t is assumed that there 15 a sample distribution 
of Y values for each X, which is usually regarded as fixed, for example, for the 
fixed X, there is a range of values of Y, ,and the observed, value (y,) is regarded as 
randomly drawn from that distribution. The set of Y, distributions are assumed to 
be equally variable for the set of X values. For any X we assume that differences in 
the Y values are due to sampling error, and seek to reduce the effect of this 
variation by taking the mean of Y gjven X. In this sense the method used results in 
an averaged regression line. Jor a fuH discussion of the assumptions in regression 
analysis, see Poole and O'Farrell (1971), • 

^ Uje notions described above can be realized with the criterion that the, 
deviaUoife<if^ints from^ the regression line should be minimized, this is achieved 
by the method^ te^t squares. The sum of squares of the deviations (error terms) 
is used rather than theabsoluje valu^ sum because it cap be differentiated to find 
its minimum. The method also reSbUsjn the maxinjum likelihood estmiates for the 
regression constants, and it is easily ^xierTde<Ltothe non-linear case. The two 
normal equations produced are solved for a and ^Tsuljs^itutjng these values in the 
equation results in a set of estimated values of the dependent vanabie^e) which 
can be compared to the observed values (Y,)., For worked examples^see King^ 
(Reference, Section 4A, pp. 12 It- 122; 135-139). - ^ 

The concept of partitioning' the total variation of the Y values into component 
parts is again used in regression analysis. The total sum of squares, I(Yj - Y)^ , for 
the dependent variable can be thought of as attributable entirely to random effects 
when tteited by itself. However, in regression analysis, some of this variation is due 
to the kibwn relationship beftveen Y and X, and an expression for the sifm of 
squares dite to regression (explained variatiop) b derived, 2(Ye - Yf . The T 

. 
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dilTefence between ToUj and regres^on sums of squares, thfen, is tfte unexplained 
'variation»2:(Yi'^ Yp)^. ' ' . ' 

' Two related questions may be raised at this point: 1) how -important is the * 
jrfelationsiiip between X and Y, and 2) how good is the fit of-the line? The first 
question is trj?ated b> forming the ratio of the regression sum of squares to the totaj 
sum oj>kiuares, which is the coefficient of determination (r^) described in Section 
me square root of this ratio is, of course, the correlation coefficient. We note^ 
lere that the regression and correlation coefficients always have the same sign. The 
second question really asks how large, on the average, ^ the deyiations? The 
standard enor of estimate arfswers this question, providing an indication of th^ 
variability of the scatter of points about the regression line and allowing the 
5nstructipn of confidence intervals or error bands. A standard error for the 
regression coefficient is also derived, and can be used to test the research hypothesis 
that the slope of the line is significant (Ho : ^ = 0). Naturally, a prope/ly desigited 
sample is necessary for the applicatipo of any inferential tests. » 

The simple linear model generally has limited value in most geo^graphic research 
situations, unless it is suspected that^otie factor is the main determinant of observed 
variations in the dependent variable. The general. linear model is more commonly 

employed, and it may be written as follows: Y-a+ 2 /JjXj-He, where e is 

i = 1 

« 

defined as for the simple case. For example, with three independent variables, the 
' y 3 

model is Y = a S -^i Xi+ e = a+ by 1.23X1 + ^\2A3^2 + ^¥3.12X3 + €. It is 

clear that the regression coefficients are expressed in a different manner from the 
simple model; for this analysis they are known as partial regressidn^cpefficients. The 
.concept of a partial correlation coefficient was introduced in Section 21. Similar 
reasoning applies to partial regression coefficients-the rate of change in Y for a 
unit Qhange in any independent variable js*computed, holding constant the effects 
of the other independent variables statistically. As they stand, partial regre^ion 
coefficients cannot be compared since their values are affected by the particular 
metric employed. Normally, then, the r^archer computes standardized regression 
coefficients; for examp^^the first partial regression coefficient in thethree variable 
case above can be standardized as a beta coefficient, j3i =bYi.23(Si/Sx). (See 
King! p. 140.) Beta values may be obmpared directly in order to evaluate the 
relative importance of each independent variable. 

Least squares methods are also employed in fitting ^ plane (for the two 
incfependent variable case) or a hyperplahe Cotherwise) to the scatter*of points. In 
addition, most modern compute'r programs use ,matrix solutions of the sets of 
normal equations; the texts by King and by Krumbein- and Graybill have good ^ 
discussions of these methods. The overall significance of the multiple regression can 
*be tested using ^n analysis of variance approach, and significance tests (using the 
/•statistic) «re available for individual partial regression coefficients, as" well as for 
differences between any two coefficients. Some of the pfoblems^'associated with 
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this technique have been discussed previously; for, example, the question of 
modifiable units in Section 21. Spatialsautocorrelation is taken up in Section 29, 
and the inclusion of regional effects, i.e. groupings of units, is described in Section 
28 (Analysis of Cqvariance).^ A further problem is that the model assumes 
independence in the predictor Variables. Lack of independence is known as the 
problem of multicollinearity, and we have already noted that one value of a 
principal components or factor analysis of a set of vanables is that a number of 
uncorrelated variables result. 

The listing of references clearly shows the extent of the application of regression 
techniques in geography. Almos^every sub-field including cultural geography 
(Sopher, 1968) is represented. Besides any particular methodol^cal inquiry, then, 
the references can be used to examine theoretical construtts. For example, the role 
of the distance factor in gravity models h illustrated by Helvig*s (1964) study of 
truck movements, while there ^e many other references tQ the frictional effects of 
distance in the cited articles dealing with population density and land value patjtcms 
in cities, or in the field of internal migration. It is often useful to follow throuSi an 
author*s use of several techniques, placing them into perspective (Blaikie, 
The use of regression techniques in fitting model parameters has already been 
described in Section 15, but Casetti, King, anci Jeffrey (1971) provide a further 
example. The partial correlation coefficients are also employed in step-wise 
regression procedures, in which a sub-set of the total number of independent 
variables is chosen in order of importance of explaining the variability in the 
dependent variable. This method is illustr.ated in the studies by Reed (1967), Brunn 
and Hoffman (1970), and Olsson (1965). 

The mapping of residuals from regression was .originally suggested by Thomas 
(i960) as a means for evaluating other potential independent variables^^ich had 
not been included in th'2^^ original model formulation. As suggested above, a 
„ non-random pattern of residuals really means thatthe assumptions of the technique 
are not fulfilled, and it m^y indicate that non-linear relationships are present, or 
that the model itself should -have an autoregr^ssive structure built, into it.' 
Certainly, any application of inferential tests would be inappropriate. Residual 
•* maps are still employed frequently in the literature, especially in a standardized 
form (Logan, 1964; Mueller, 1970). 
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SECTION 28 



ANALYSIS OF COVARIANCE^ 

In analysis of variance models (Section 17) the groups may not be randomized in 
terms of the criterion variable, and if this is the case pre-treatmeni differences are 
said to exist. The original use of the analysis of covdriance was to take out such 
pre-frcatment differences. As-employed in geographic research, however, a different 
interpretation is made. For example, Thomas (1960) studied the differential growth 
of suburban areas in Chicago, and it appeared that suburbs located in certain sectors 
had higher growth rates than tho5e in other sector^^ A sectoral effect could then be 
hypothesized, andlhe inclusion of this into the model should. increase the* level of 
" explanation. Regional effects are obviously pf the same genre. Note, that in, essence 
thisHac;tor is a nominar^cale, and as such it cannot bjC included in^ ordinary 
regression models. The use of dummy^variates, where a unit area would hdve a value 
of 1 if it was in the sector or regfon represented by the variable, and 0 otherwise, is 
' one w^y in which attempts have b^n made to account for. the effect. There jSire, 
however, certain technical difficulties associa^ted with the use of dummy variates, 
see especially; Lansing and Morgan, 1971 (pp.-»3 14-343). ^ ^ - 

Th^ analysis of cov^iiance is more effective in such siftfations. It combines the 
analysis of 'variance and regression ^techniques, and is thus a rather powerfuF 
inferential mod^el. To compensate for i\i'\s for the applied*researcher, however, the 
^|^m*pfipns that*must be fulfilled are rather restrictive. The analysis tests for the 
-influence of groups (sectors, regions) on the levej of explanation in the total 
regression model., Two assumptions must be fulfilled before the test can bima^e: 
, 1) the- variances in the dependent variable do not differ fwm group to group 
(Bartlett*s test. Section 16); 2) the group regression lines are parallel ^Ho : =fe 
= . . . v = /5k^ for ^ groups), for which the test statistic is the F-distribution. If these 
'assumptions are fulfilled, the. only way in which the ^roup regressions can differ is 
by having Y -intercepts at different elevations; this is tested by an F-statistic with 
reference to a common, regression Ijne. If the null hypothesis is rejected^ acegional 
effect can be inferrcd,'in the light of acknowledged functional relationships.^ 

The analysis of covariancc would appear to be an important riiethod whereby 
regional effects' can be evaluated (King, 1961;Kariel, 1963). It is clear, however, 
that the usual caveat of correctly designed sampling frameworks is needed in order 
to satisfy the assumptions. ^ - 

, , \ * , 
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- SECTION 25 

SPATIAL AUTOCORRELATION 

\. > , ' 

If the assumptions of the general linear model described in Section 27 are 
fulfilled, dien the expected spatial pattern of residuals from regression (the error 
terms in t|)e model— assumed to be mdependent^and normally distributed) is a 
random one. Any lack of randomness is called a spatial autocorrelation effect. 
♦ •techniques us^d to study, this phenomenon need apply not only to residuals, 
however. The distrlbution of any one^of the original values may be*affected by its, 
neighboring points in any direction^ In this sense, the ge^ographic problem differs ' 
considerably from usual serial £orrelations met with in analyzing data organized 
along one ^imensioij, i.e. time-series (Sec?tbn 31) where the value at a point is 
dependent only on previous values (see particulaflyGurry, 1970). 

In this sectibn, methods which are not directly based on concepts from 
time-series analysis are referenced, although it sh,ould*be noted that there are many 
^ linkages between these methods and those employed in thp analysis of trend- 
surfaces {Section 30) and spatial Series ^Se^tion 31). The original work in this area^- 
was by Geary (1954), who was concerned with the non-randomne§s of data values 
^r neighboring counties as well as the possible effects on regression analysis. In 
or^ler to estimate this effect, Geary devised -9. contiguity ratio (C), which, as the 
name suggests, incorporated the nunlber of connections between any county and 
otliers in the study area. If C = 1, the distribution is regarded as random (i.e. no 
autocorrelation), and a sampling theory based on the normal, distribution was 
derived. A distinction is made between a randomizatioi^ approach, in^hich the set 
of areal, units is regarded as the universe, and the normal approach in whjch the 
units are assumed to be a random sample Trom a parent population that is normally. 



distributed. This distinction is carried o^ve^r to Imore recent work in'the differences 
-jbetween non-free and free sampling methods and theory. ^ • 

Dacey (1968) genej-alized Gearyjs wbrft with special re/erence to the regression 
residuals problem. OveV- and under-predictions were categorized into Black and 
White, the probabilities of,BB^BW, and WW joins were evaliiated, and contiguity 
could then be tested with reference to the standard normal curve. The arrangement, 
of points per cell, as used in point pattern analysis ^(S^ctipn 20), can also be 
evaluated for randomness,, using a non-parametric te^t. 'lucent work by Cliff and 
Ord has considerably extended this earlier research, placing the results into a larger 
inferential framework and deriving the sampling distribution of a spatial autocorre- 
lation coefficient under odifTerent sampling schemes. Interesting by-prpducts of Jtheir 
efforts are 1) the j)ossibility of employing the statistic in regionalization themes, 
and 2) that size of areal units might be specified in order to minimize the 
dependence of data values on neighboring points. 

^ % / - - 
CLIFF, A. D. "The Neighbouihood Effect in the Diffusion of Innovations,** 
Transactions of the Institute of*British Geographers, No. 44, 1968, pp. 
, ' 75-84. . ' < ' , . \ 

CLIFF, A. D. "Computing the Spatial Correspondence TJetween^Geographical 
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in Berry, B. J. L. anS D. F. MarWif (eds.)^ Spatiat Analysis. A Rea^ in 
^Statistical Geography. Enclewbod Cliffs, N.J.: Prentice-Hall, 196f,spp. 
479-495. * * * , 
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Statistician, Vol. 5, 1954, pp. 115-145. 

SEE ALSO: Section 11. Matern (I960). 
^ Section 25# Reynolds and A^^her (1969). 
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SECTION SO 



TREND SURFACE ANALYSIS 



A particularly interesting form of the general linear model described in Section 
27 has been developed largely by geologists. ICrumbein andj Gray l?ill (1965) specify ' 
the model as follows" , > ** , ' 

IKUi,;^) = T(I^Vj) where T(U,,Vp 

the observed- value'pf a mapped Variable at orthugonal grid locations and Vj, 
IS the trend surface component, and represents a randorfi error term. 
Thgj tatter two te;ms are also called regional and local components jn the 
lit^fature-onc*e again, a partition of variation. " 

Thus, we are dealing with a multiple regression problem, with two independent 
vanabjes. Polynomial regression is often employed to estimate the coefficients 
(Tobler, 1^64). The first-order polynomial surface is simply the linear trend 
(T(lJ,yp = Poo /^i 0^1 + j3o 1 ^+ e^), while the second-order (quadratic) surface 
would then be written as ^ 

T(U,,Vj) = /3bo + oUr + Po I Vj 0 Ui' + /3, 1 U,Vj + Vj' + ^ij- 

Note that the computation of the /^-coefficients is considerably eased if the data are 
^eTerenced by^mean^of equally-spa£ed intervals, in which case tabled values of 
orthogonal • polynomials can be used. If cyclic fluctuations in the trend are 
suspected, double'Fourier series can tflso be computed directly in a similar manner 
(Harbaugh and Sackm' 1968, see also Section 31)^ With iri-egularly spaced data 
^ ^ values, least squares methods are employed esUmate the coefficient values. Many 
of the listed references are for computer^programsVhid^are based' on lea^t squares; 
see^the review by Harbaugh and Merriamil968). \ ^\ % 

Applications of this technique have increased greatly^in\g^OgrapTiy in recent^ 
years (Chorlcy and Haggetl, 1965, NorcUffe, 1969). For example^ trend surfaces 
have'been computed for residuals *from regression (Fairbairn and ^Robinson, 1967) 
and used in the comparison of intra^egional structures (Haggett, 1967; but 
compare Macomber, 197 l)^jn physical ggographx the study of erosjon surfaces lia:$ 
' been resurrected by trend analysis (See King, 1969^> Rodda, U;70, Smith et ai, 
1969, Thornes and'4onJf§rt%9,^and tjie cr^ical \viewby Tarrant, 1970)^ The 
relationships between trend analysis and sn?o6thing (fHtering) functions, and'their 
use in map comparisons, are well explained by Tobler (1969). • 

'Several problems are apparent. First, the trend will, of course, be computed only 
for that set of locations included»ni the study area Selection criteria for the latter 
are tlierjefore most important. Second, Ih^re are difficulties ^n establishing the 

o . ; . _ • ; ' ■ • • ^ 



significance- of a trend at order K say, compared to order k + 1 (Chayes, 1970). The 
, usual procedure is to form a rati(|.of explained variance to total variance (reduction 
in total tdm of squares), ^e,rhaps testing for significance using the^F-distribution 
(UnWin,U970). Howarth's (1967) experunents with randoni^daja, however, liidicate 
that this procedure may be misleading foT lower-o^er surfaces (up to the cubic). . 
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University of Michigan^ 1 968 . ^ ' 
CHAYES^ "On Deciding Whether Trend Surfaces of Progressively Higher Order 
are Meaningful,'' Bulletin, Geological Sodiety of America, Vol. 81, 191^5^ pp. 
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' SECTION 31 



TEMPORAL AND SP^IAL SERIES . . ^ 

The analysis of a series of events ordered in one dimension, tisually time, has 
long attracted the attention of econometricians, statisticians, and mHhematicians. . 
In the Jatter casfe, oscillations are trepted as^ periodic functions and there is little 
attention io the error component in a time series, such as that for precipitation over 
a year. Statistical con sidera ti ons' areiiq pt)rtant when predio^on is involved and 
error estimates have to be made. X'typicaF^procedlire in time-series analysis. Would * 
be to reniove the trend,(e.g. by regression'^nethods-Barrett, 1966), establi sh an ^ 
seasonaPor (Cyclic e(1^>ct, ^(fthen analyze the resicuial- error terms. Ruris tests, 
based on a series of. + and - terms with fespect to the median„can also be carried 
out for randomness'. Thus, the concept of a partition oCthe total variation of a set 
of values is also applied here. 

The cyclic component is usually analyze'B by harmonic .or^Voutier series, a 
mathematical expression consisting of terms containing sines and cosine^. The 
assumption that is generally 'made in this case is that the series is stationary- *. 
statistical properties of the distribution (moment pleasures and autocorrelation 
functions) constant throughout the range of the data. Single Fourier series are used 
to ^escribe periodic time series, the shape of the curve depending on the number of 
terms used and the vah!^\>f* the cpefficients in the terms. The method appears ib 
be particularly suitable in precipitation climatology (Horn and Bryson, 1960; 
Sabbagh ^nd Bryson, ^962). Bryson and Dutton(1967) summarize the value of^tfie • 
approach, at' first major irregulariti^^^ removed by smoothing the series, using the!^-^ 
technique of moving coverages. The remaining finite set of averages can be 
completely described by^ harmonic function, ani th^^fTDmber of cycles determines 
the order of the series. Jn many cases the secofld*harmonic suffices in te;ms of 



description, i.e. semi-annual cycles are evidently most common in this branch of y\ 
dimatology. Maps can then be made 'of the phase angle anc^ amplitude of each 
harmonic, and these can be used in regional studies. Note th^t knowledge .of 
cyclical behavior, a prerequisite to meaningful application T>fi1>^ metlipds, 15 NyelJ 
advanced m climatology compared to many other branches of geotraphy. 

In two dimensions, as noled before, this simple, approach hasjto^be modified i<x 
account for direttionarmtluences on the values at any point. tren<J can be hancJJed 
by ^methods describes in the preceding section, and d*ouble Fourier series can be 
enjployed for two-diiaiensional periodic phenomena. Casetti (f9^6) illustrates this 
' approach in terms, of the effects of different ^zed areal uqits acting «S filters in 
producing different ^harmonics, and the implications of this fo} correlation an^ 
regression analysis. Granger (1^69) discusses the relationships between time- and 
space-series Recent research efforts have been expended in applying more advanced 
techniques to geographic problems. Interest has centered upon the uSe of s{/ectral * 
analysis (Rayner, 1967,' 1971 ; 5assett and Tinline, 1970), which is concerned with 
the identification of amplitudes and frequencies of component cycles making up 
the periodic proportion of the series A measure of correspondence between two / . 
spatial series (coherence) pan be computed for each band. The value of this 
approach is that coherence ca'n then be looked at for varying sc^fjs (Rayner, 1971), 
which would '*be extremely beneficial in terms of problehis faced in the uslial 
geographic applications of correlation analysis. ' >-^}» 

While it is clear that climatological research is particularly likely to Benefit from 
applications of spectral techniques, since physical {jrocgsses can be directly inferred, 
other branches of geography are likely to be influenced by the approach.' For ^ 
•example, central place t^oiy indicates a certain periodicity in the spacing of 
settlements. Tobler (1969) has examined the spqftrum of population densities 
along U.S. 40 fjbm Baltimore to San Fraticisco'in this con^text. The diffusion of an 
innovation (agricultural subsidies), as modelled by Hagerstrand, h^s been studied by 
Barton and Tobler (1971) by means of an optical analpgue»to estimate changes in 
spectral densijie^ ovej tjn^e. The researchers* potentiafftility to spepify ^processes 
operating at different scal/s is a likel/ ioimense benefit ^of this approach, although 
the necessity of a strcffig theoretical frame Wbrk for any empirical study is also 
underlined. - * ^\ 
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The purpose of classification is to produce groups or clusters of unit' areas iri 
which the within-group distance dr variance is minimized, and acterdingly 
bctween-grijup variance is maximized. Dimensional analysis is employed to 
determine tlje distances separating areas in the space defined by the attributes 
J * under study. The most efficient technique would utilize orthogonal axes to define 
,the space, dnce the theorem of Pythagoras can then be called upon to define 
vdistances. Nfost of the multivariate classifications accordingly are based upon the 
factor scores from prior analy^^since the new variables have standardized scales 
" and are made orthogonal to each other. The distance measurements can be made in 
a space of any dimensions, and they are used to index similarity between areas; 
unijts located closer to each d^her are more similar. A distance matrix (of order n, 
^ say, the number of areas, Xn) which is square and symmetric can thus be derived, 
and the grouping routine operates on it in an hierarchical ihanner. At step one, 
there will be n single member groups. The matrix is searched to^find the smallest 
distance^ and the respective row and column pertaining to that pbservation are 
combined, producing an n-l X n-l matrix. The steps€re repeated until there is only 
one^group, containing all the unit areas. The wltKTn-group variance^ equal to zero at 
the first s^, successively increases, while the between-gfoup variance is decreased. 

The meftod described is only one of a number of alternative algorithms available 
for classification (Lankford, 1969). It is,* however, one of the most-commonly 
reported in the literature, largely because of the influence of Berry whose senjinal 
paper in I96J reinvigorated long-standing interest in^ geography irH the allied 
problem (Jffregionalization (Grigg, 1965). The methodology. Js dependent on 
\ techniques of numerical taxonomy (see SokaPand^Snpath, l%3, cited in Sect^n 
4B), and is fully described by Berry (.1967). The types of grouping schemes 
resulting from this analysis are dependent on the nature of the input information. If 
an attribute matri^j/rTDsed, the methods will produce regional types of the formal 
kind, in the afweffce of any strong contigjiity in Jthe ^riginal;data. The^^^oluR^can 
be made irfto a set of non-overlappTiTg\regiorfs by the addition of contiguity or 
compactness constraints on the allocanOh of'^unit^-^reas* to existing groups. 
Alternatively, a set of functional regions' is produced usfng as input an interaction 
matrix, Spence and Taylor (1970) providj an overview of the vaHous alternafives. 

fair degree of siibjettivity is evident in the methods reported in the literature 
(Johnston, 1968). There is, for example^ a choic.e to be ma4e in. the coefficienf of 
association that is used, as well as tfie type of algorithm.* A difficult problem to 
^ resolve isi^how many groups should be included in the final ^olufionft Discriminant 
analysis (Section 33) can provide some a^^intRTl respec^<Johnston (1^70) has also 
stressed the ihiportance of a hyooth^is-testing framework for .anyclas^jficati^. 
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since only in this way can inductive generalizations be made, which in turn will help 
to advance the theoretical fraihework the discipline. ^ , 
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SECTION 3^ 
DISCRIMINANT ANALYSIS^ 



Discriminant analysis is employed ^ for ^ set of observations which are already 
classified in spme manner. AlthougK th<^ technique was originally developed to 
allocate new obsejvations to a se{ of pre-established classes on the basis of certain 
characteristics, its most common use in geographic^ research loday is as an aid in 
classification (King, 1970). At any sta^ in the grouping process describ^ed in the 
preceding section it is pos^ble to compute the linear discriminant functions which 
are linearly related to the factor scores used as inpuf to, the algorithm. The 
coeffiGients of these functions are determined in such a way that discrimination 
between the groups is maximized. Thus ithe method has a strong similarity to 
principal pomgonejits analysis, and the researcher is able to interpret the bases of « 
the classification (Casetti, 1964). Multiple discrimixfant iterations are used in the 
Casetti studies to force classifications to optimal solutions, with criteria that are the 
object of deification itself, Le.^hat the within-group variance is minimized and 
between-group* variance maximized. Casetti also pres.?nts tests using chi-square 
statistics for determining the quality of a'^classificatioh. 
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'SECTION 34 
' . CANONICAL CORRELATION 



Canonical correlation is the most general form.of correlation analysis. It seeks to 
maximize the covariance or correlation between two sets of original varial^les, say X 
and Y, by computing new variates, say and V^, which are linear combinations 
of X and Y and are maximally correlated: • 

^lc' "^*kX2nd Vk =j3kY, where ttk and /3k 

• are the coefficients of the resulting canonical vectors. It can readily be appreciated 
th^t this procedure has much in common with principal components analysis,- 
Indeed, interpretation t)roceeds in an analogous fashion. The coefficients are like 
the loadings for different components, and the strength and sign can be used to^ 
indicate which of the original variables are to be* considered, and the direction of 
their association. As indicated, the first pair of canonical vectors extracted has the 
highest correlation, and^subsequent pairs not only report the maximum amount of 
corr/la^KQn_9p^ie residual variance but are also made Hhogon^J te the first pair. 
The researcher is there^re able to describe the independent ways in which (fie 
relationships are specified between Xh^ two sets^ ' [ 
."*^Although, the technique was originally devised^o account for any two sets of 
variables, the applicatioa|irf geography are usu^y two sets of factor scores^, i.e. the 
intercorrelatioife for each; set are zero. For example, B^ry (1966) compares the 
factor structure of Jndian districts (derived from the attribute matrix) with the 
structure of flows between trad^'^blocks (from a dyadic formulation of a set of 
interaction matrices). Gaiithier {^t968) compared levels of economic development 
♦with changes in the .-transportation surface using canonical correlation. An 
.interesting recent development has seen the technique applied to the analysis df 
trend surfaces (Lee, 1969; Monmonier, 1970). - / • 
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